--- tags: - text-generation-inference - transformers - unsloth - qwen3_moe license: apache-2.0 language: - en datasets: - Tesslate/Gradient-Reasoning - Daemontatox/natural_reasoning - Daemontatox/numina_math_cconvs - Daemontatox/curated_thoughts_convs library_name: transformers base_model: - Qwen/Qwen3-30B-A3B --- # Mini-Hydra ![image](./Image.jpg)
Model on Hugging Face
A specialized reasoning-focused MoE model based on Qwen3-30B-A3B
## Model Details ### Model Description Mini-Hydra is a Mixture-of-Experts (MoE) language model designed for efficient reasoning and faster conclusion generation. Built upon the Qwen3-30B-A3B architecture, this model aims to bridge the performance gap between sparse MoE models and their dense counterparts while maintaining computational efficiency. - **Developed by:** Daemontatox - **Model type:** Mixture-of-Experts (MoE) Language Model - **Architecture:** Qwen3-30B-A3B based - **Activated Parameters:** 3 billion - **Total Parameters:** ~30 billion (with MoE routing) - **Language(s):** English (primary), with multilingual capabilities inherited from base model - **License:** [Apache 2.0] - **Finetuned from model:** Qwen3-30B-A3B ### Model Sources - **Repository:** https://huggingface.co/Daemontatox/Mini-Hydra - **Base Model:** Qwen3-30B-A3B - **Training Datasets:** - [Tesslate/Gradient-Reasoning](https://huggingface.co/datasets/Tesslate/Gradient-Reasoning) - [Daemontatox/curated_thoughts_convs](https://huggingface.co/datasets/Daemontatox/curated_thoughts_convs) - [Daemontatox/natural_reasoning](https://huggingface.co/datasets/Daemontatox/natural_reasoning) - [Daemontatox/numina_math_cconvs](https://huggingface.co/datasets/Daemontatox/numina_math_cconvs) ## Uses ### Direct Use Mini-Hydra is designed for applications requiring: - **Efficient reasoning:** Optimized for logical problem-solving with reduced computational overhead - **Mathematical reasoning:** Enhanced performance on mathematical problems and proofs - **Conversational AI:** Natural dialogue with reasoning capabilities - **Code generation:** Programming assistance with logical reasoning steps - **Educational applications:** Tutoring and explanation generation ### Downstream Use The model can be further fine-tuned for specific domains such as: - Domain-specific reasoning (legal, medical, scientific) - Specialized mathematical problem solving - Custom conversational agents - Educational content generation ### Out-of-Scope Use This model is not intended for: - Production systems requiring 100% accuracy without human oversight - Generating harmful, biased, or inappropriate content - Real-time applications requiring sub-second response times - Applications where model hallucination could cause harm ## Bias, Risks, and Limitations ### Known Limitations 1. **Training Constraints:** Due to resource limitations, the model received less training than originally planned, which may impact performance in some scenarios. 2. **Reasoning Scope:** While optimized for reasoning, the model may still struggle with very complex multi-step logical problems. 3. **Language Bias:** Primary training on English may lead to reduced performance in other languages. 4. **Knowledge Cutoff:** The model's knowledge is limited to the training data cutoff date. ### Potential Risks - **Hallucination:** Like all language models, Mini-Hydra may generate plausible-sounding but incorrect information - **Bias:** May reflect biases present in training data - **Overconfidence:** May present uncertain information with high confidence ### Recommendations - Always verify critical information from reliable sources - Use appropriate safety measures and human oversight for important applications - Consider the model's limitations when deploying in production environments ## Training Details ### Training Data The model was trained on a carefully curated combination of reasoning-focused datasets: 1. **Tesslate/Gradient-Reasoning:** Advanced reasoning problems with step-by-step solutions 2. **Daemontatox/curated_thoughts_convs:** Curated conversational data emphasizing thoughtful responses 3. **Daemontatox/natural_reasoning:** Natural language reasoning examples and explanations 4. **Daemontatox/numina_math_cconvs:** Mathematical conversation and problem-solving data ### Training Procedure - **Base Model:** Qwen3-30B-A3B - **Training Objective:** Optimized for efficient reasoning and faster conclusion generation - **Architecture:** Mixture-of-Experts with 3B activated parameters - **Training Constraint:** Limited by resource availability, resulting in abbreviated training cycle ### Training Infrastructure - **Hardware:** [2 A100 GPUs] - **Training Time:** [72 hrs] - **Compute Resources:** Resource-constrained environment ## Evaluation ### Testing Data, Factors & Metrics The model's performance should be evaluated on: - **Reasoning Benchmarks:** GSM8K, MATH, LogiQA - **General Language Tasks:** MMLU, HellaSwag, ARC - **Efficiency Metrics:** Inference speed, memory usage - **Reasoning Quality:** Step-by-step problem solving accuracy ### Results [Note: Specific benchmark results would be added here once available] The model demonstrates: - Improved reasoning efficiency compared to dense models of similar size - Competitive performance despite resource-constrained training - Faster inference times due to MoE architecture ## Technical Specifications ### Model Architecture - **Base:** Qwen3-30B-A3B MoE architecture - **Experts:** Multiple expert networks with routing mechanism - **Activated Parameters:** 3 billion per forward pass - **Total Parameters:** ~30 billion - **Context Length:** [Inherited from base model - likely 32K tokens] - **Vocabulary Size:** [Inherited from base model] ### Compute Infrastructure - **Training:** Resource-constrained environment - **Inference:** Optimized for efficiency with 3B activated parameters - **Memory Requirements:** Significantly reduced compared to equivalent dense models ## How to Use ### Installation ```bash pip install transformers torch accelerate ``` ### Basic Usage ```python from transformers import AutoTokenizer, AutoModelForCausalLM import torch # Load model and tokenizer model_name = "Daemontatox/Mini-Hydra" tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForCausalLM.from_pretrained( model_name, torch_dtype=torch.float16, device_map="auto", trust_remote_code=True ) # Example inference def generate_response(prompt, max_length=512): inputs = tokenizer.encode(prompt, return_tensors="pt") with torch.no_grad(): outputs = model.generate( inputs, max_length=max_length, num_return_sequences=1, temperature=0.7, do_sample=True, pad_token_id=tokenizer.eos_token_id ) response = tokenizer.decode(outputs[0], skip_special_tokens=True) return response[len(prompt):].strip() # Example usage prompt = "Solve this step by step: If a train travels 120 miles in 2 hours, and then 180 miles in 3 hours, what is the average speed for the entire journey?" response = generate_response(prompt) print(response) ``` ### Advanced Usage with Custom Parameters ```python from transformers import AutoTokenizer, AutoModelForCausalLM, GenerationConfig import torch model_name = "Daemontatox/Mini-Hydra" tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForCausalLM.from_pretrained( model_name, torch_dtype=torch.float16, device_map="auto", trust_remote_code=True ) # Custom generation configuration for reasoning tasks generation_config = GenerationConfig( temperature=0.1, # Lower temperature for more focused reasoning top_p=0.9, top_k=50, repetition_penalty=1.1, max_length=1024, do_sample=True, pad_token_id=tokenizer.eos_token_id ) def reasoning_generate(prompt, system_prompt="Think step by step and provide a clear reasoning process."): full_prompt = f"{system_prompt}\n\nProblem: {prompt}\n\nSolution:" inputs = tokenizer.encode(full_prompt, return_tensors="pt") with torch.no_grad(): outputs = model.generate( inputs, generation_config=generation_config ) response = tokenizer.decode(outputs[0], skip_special_tokens=True) return response[len(full_prompt):].strip() # Example reasoning problem math_problem = """ A rectangular garden has a length that is 3 times its width. If the perimeter is 32 meters, what are the dimensions of the garden? """ solution = reasoning_generate(math_problem) print(solution) ``` ### Batch Processing ```python def batch_reasoning(prompts, batch_size=4): results = [] for i in range(0, len(prompts), batch_size): batch_prompts = prompts[i:i+batch_size] batch_inputs = tokenizer( batch_prompts, return_tensors="pt", padding=True, truncation=True ) with torch.no_grad(): batch_outputs = model.generate( **batch_inputs, max_length=512, temperature=0.7, do_sample=True, pad_token_id=tokenizer.eos_token_id ) batch_responses = tokenizer.batch_decode(batch_outputs, skip_special_tokens=True) results.extend(batch_responses) return results # Example batch processing problems = [ "What is 15% of 240?", "If x + 5 = 12, what is x?", "A circle has radius 7. What is its area?", "Solve: 2x + 3 = 15" ] solutions = batch_reasoning(problems) for problem, solution in zip(problems, solutions): print(f"Problem: {problem}") print(f"Solution: {solution}") print("-" * 50) ``` ## Example Outputs ### Mathematical Reasoning ```python Input: "Solve this step by step: If a train travels 120 miles in 2 hours, and then 180 miles in 3 hours, what is the average speed for the entire journey?" Output: "Let me solve this step by step: Step 1: Calculate total distance Total distance = 120 miles + 180 miles = 300 miles Step 2: Calculate total time Total time = 2 hours + 3 hours = 5 hours Step 3: Calculate average speed Average speed = Total distance ÷ Total time Average speed = 300 miles ÷ 5 hours = 60 miles per hour Therefore, the average speed for the entire journey is 60 mph." ``` ### Logical Reasoning ```python Input: "If all roses are flowers, and some flowers are red, can we conclude that some roses are red?" Output: "Let me analyze this logical statement: Given premises: 1. All roses are flowers 2. Some flowers are red The question asks if we can conclude that some roses are red. Analysis: - From premise 1: roses ⊆ flowers (roses are a subset of flowers) - From premise 2: ∃ flowers that are red However, we cannot definitively conclude that some roses are red because: - The red flowers mentioned in premise 2 might be flowers other than roses - We have no information directly connecting roses to the property of being red Conclusion: No, we cannot logically conclude that some roses are red based solely on the given premises." ``` ### Code Generation with Reasoning ```python Input: "Write a Python function to find the factorial of a number, and explain your approach." Output: "Here's a Python function to calculate factorial with explanation: def factorial(n): """ Calculate the factorial of a non-negative integer n. Args: n (int): Non-negative integer Returns: int: n! (factorial of n) Raises: ValueError: If n is negative """ if n < 0: raise ValueError("Factorial is not defined for negative numbers") elif n == 0 or n == 1: return 1 else: result = 1 for i in range(2, n + 1): result *= i return result ``` ## Reasoning behind the approach: 1. Input validation: Check for negative numbers since factorial is undefined for them 2. Base cases: 0! = 1 and 1! = 1 by mathematical definition 3. Iterative calculation: For n > 1, multiply all integers from 2 to n 4. This iterative approach is more memory-efficient than recursion for large numbers ## Example usage: ```python print(factorial(5)) # Output: 120 print(factorial(0)) # Output: 1 ``` ## Model Card Authors **Primary Author:** Daemontatox ## Model Card Contact For questions, issues, or collaboration opportunities, please contact through the Hugging Face model repository. ## Citation ```bibtex @misc{mini-hydra-2024, title={Mini-Hydra: Efficient Reasoning with Mixture-of-Experts}, author={Daemontatox}, year={2024}, publisher={Hugging Face}, howpublished={\\url{https://huggingface.co/Daemontatox/Mini-Hydra}}, note={Based on Qwen3-30B-A3B architecture} } ``` --- *This model card follows the guidelines established by the Hugging Face Model Card framework and includes technical details, usage examples, and important limitations to ensure responsible use of the model.*