|
--- |
|
tags: |
|
- text-generation-inference |
|
- transformers |
|
- unsloth |
|
- qwen3_moe |
|
license: apache-2.0 |
|
language: |
|
- en |
|
datasets: |
|
- Tesslate/Gradient-Reasoning |
|
- Daemontatox/natural_reasoning |
|
- Daemontatox/numina_math_cconvs |
|
- Daemontatox/curated_thoughts_convs |
|
library_name: transformers |
|
base_model: |
|
- Qwen/Qwen3-30B-A3B |
|
--- |
|
|
|
# Mini-Hydra |
|
|
|
 |
|
|
|
<div align="center"> |
|
<img src="https://huggingface.co/spaces/huggingfacejs/badges/resolve/main/model-on-hf-md-dark.svg" alt="Model on Hugging Face"> |
|
<br> |
|
<strong>A specialized reasoning-focused MoE model based on Qwen3-30B-A3B</strong> |
|
</div> |
|
|
|
## Model Details |
|
|
|
### Model Description |
|
|
|
Mini-Hydra is a Mixture-of-Experts (MoE) language model designed for efficient reasoning and faster conclusion generation. Built upon the Qwen3-30B-A3B architecture, this model aims to bridge the performance gap between sparse MoE models and their dense counterparts while maintaining computational efficiency. |
|
|
|
- **Developed by:** Daemontatox |
|
- **Model type:** Mixture-of-Experts (MoE) Language Model |
|
- **Architecture:** Qwen3-30B-A3B based |
|
- **Activated Parameters:** 3 billion |
|
- **Total Parameters:** ~30 billion (with MoE routing) |
|
- **Language(s):** English (primary), with multilingual capabilities inherited from base model |
|
- **License:** [Apache 2.0] |
|
- **Finetuned from model:** Qwen3-30B-A3B |
|
|
|
### Model Sources |
|
|
|
- **Repository:** https://huggingface.co/Daemontatox/Mini-Hydra |
|
- **Base Model:** Qwen3-30B-A3B |
|
- **Training Datasets:** |
|
- [Tesslate/Gradient-Reasoning](https://huggingface.co/datasets/Tesslate/Gradient-Reasoning) |
|
- [Daemontatox/curated_thoughts_convs](https://huggingface.co/datasets/Daemontatox/curated_thoughts_convs) |
|
- [Daemontatox/natural_reasoning](https://huggingface.co/datasets/Daemontatox/natural_reasoning) |
|
- [Daemontatox/numina_math_cconvs](https://huggingface.co/datasets/Daemontatox/numina_math_cconvs) |
|
|
|
## Uses |
|
|
|
### Direct Use |
|
|
|
Mini-Hydra is designed for applications requiring: |
|
- **Efficient reasoning:** Optimized for logical problem-solving with reduced computational overhead |
|
- **Mathematical reasoning:** Enhanced performance on mathematical problems and proofs |
|
- **Conversational AI:** Natural dialogue with reasoning capabilities |
|
- **Code generation:** Programming assistance with logical reasoning steps |
|
- **Educational applications:** Tutoring and explanation generation |
|
|
|
### Downstream Use |
|
|
|
The model can be further fine-tuned for specific domains such as: |
|
- Domain-specific reasoning (legal, medical, scientific) |
|
- Specialized mathematical problem solving |
|
- Custom conversational agents |
|
- Educational content generation |
|
|
|
### Out-of-Scope Use |
|
|
|
This model is not intended for: |
|
- Production systems requiring 100% accuracy without human oversight |
|
- Generating harmful, biased, or inappropriate content |
|
- Real-time applications requiring sub-second response times |
|
- Applications where model hallucination could cause harm |
|
|
|
## Bias, Risks, and Limitations |
|
|
|
### Known Limitations |
|
|
|
1. **Training Constraints:** Due to resource limitations, the model received less training than originally planned, which may impact performance in some scenarios. |
|
|
|
2. **Reasoning Scope:** While optimized for reasoning, the model may still struggle with very complex multi-step logical problems. |
|
|
|
3. **Language Bias:** Primary training on English may lead to reduced performance in other languages. |
|
|
|
4. **Knowledge Cutoff:** The model's knowledge is limited to the training data cutoff date. |
|
|
|
### Potential Risks |
|
|
|
- **Hallucination:** Like all language models, Mini-Hydra may generate plausible-sounding but incorrect information |
|
- **Bias:** May reflect biases present in training data |
|
- **Overconfidence:** May present uncertain information with high confidence |
|
|
|
### Recommendations |
|
|
|
- Always verify critical information from reliable sources |
|
- Use appropriate safety measures and human oversight for important applications |
|
- Consider the model's limitations when deploying in production environments |
|
|
|
## Training Details |
|
|
|
### Training Data |
|
|
|
The model was trained on a carefully curated combination of reasoning-focused datasets: |
|
|
|
1. **Tesslate/Gradient-Reasoning:** Advanced reasoning problems with step-by-step solutions |
|
2. **Daemontatox/curated_thoughts_convs:** Curated conversational data emphasizing thoughtful responses |
|
3. **Daemontatox/natural_reasoning:** Natural language reasoning examples and explanations |
|
4. **Daemontatox/numina_math_cconvs:** Mathematical conversation and problem-solving data |
|
|
|
### Training Procedure |
|
|
|
- **Base Model:** Qwen3-30B-A3B |
|
- **Training Objective:** Optimized for efficient reasoning and faster conclusion generation |
|
- **Architecture:** Mixture-of-Experts with 3B activated parameters |
|
- **Training Constraint:** Limited by resource availability, resulting in abbreviated training cycle |
|
|
|
### Training Infrastructure |
|
|
|
- **Hardware:** [2 A100 GPUs] |
|
- **Training Time:** [72 hrs] |
|
- **Compute Resources:** Resource-constrained environment |
|
|
|
## Evaluation |
|
|
|
### Testing Data, Factors & Metrics |
|
|
|
The model's performance should be evaluated on: |
|
- **Reasoning Benchmarks:** GSM8K, MATH, LogiQA |
|
- **General Language Tasks:** MMLU, HellaSwag, ARC |
|
- **Efficiency Metrics:** Inference speed, memory usage |
|
- **Reasoning Quality:** Step-by-step problem solving accuracy |
|
|
|
### Results |
|
|
|
[Note: Specific benchmark results would be added here once available] |
|
|
|
The model demonstrates: |
|
- Improved reasoning efficiency compared to dense models of similar size |
|
- Competitive performance despite resource-constrained training |
|
- Faster inference times due to MoE architecture |
|
|
|
## Technical Specifications |
|
|
|
### Model Architecture |
|
|
|
- **Base:** Qwen3-30B-A3B MoE architecture |
|
- **Experts:** Multiple expert networks with routing mechanism |
|
- **Activated Parameters:** 3 billion per forward pass |
|
- **Total Parameters:** ~30 billion |
|
- **Context Length:** [Inherited from base model - likely 32K tokens] |
|
- **Vocabulary Size:** [Inherited from base model] |
|
|
|
### Compute Infrastructure |
|
|
|
- **Training:** Resource-constrained environment |
|
- **Inference:** Optimized for efficiency with 3B activated parameters |
|
- **Memory Requirements:** Significantly reduced compared to equivalent dense models |
|
|
|
## How to Use |
|
|
|
### Installation |
|
|
|
```bash |
|
pip install transformers torch accelerate |
|
``` |
|
|
|
### Basic Usage |
|
|
|
```python |
|
from transformers import AutoTokenizer, AutoModelForCausalLM |
|
import torch |
|
|
|
# Load model and tokenizer |
|
model_name = "Daemontatox/Mini-Hydra" |
|
tokenizer = AutoTokenizer.from_pretrained(model_name) |
|
model = AutoModelForCausalLM.from_pretrained( |
|
model_name, |
|
torch_dtype=torch.float16, |
|
device_map="auto", |
|
trust_remote_code=True |
|
) |
|
|
|
# Example inference |
|
def generate_response(prompt, max_length=512): |
|
inputs = tokenizer.encode(prompt, return_tensors="pt") |
|
|
|
with torch.no_grad(): |
|
outputs = model.generate( |
|
inputs, |
|
max_length=max_length, |
|
num_return_sequences=1, |
|
temperature=0.7, |
|
do_sample=True, |
|
pad_token_id=tokenizer.eos_token_id |
|
) |
|
|
|
response = tokenizer.decode(outputs[0], skip_special_tokens=True) |
|
return response[len(prompt):].strip() |
|
|
|
# Example usage |
|
prompt = "Solve this step by step: If a train travels 120 miles in 2 hours, and then 180 miles in 3 hours, what is the average speed for the entire journey?" |
|
response = generate_response(prompt) |
|
print(response) |
|
``` |
|
|
|
### Advanced Usage with Custom Parameters |
|
|
|
```python |
|
from transformers import AutoTokenizer, AutoModelForCausalLM, GenerationConfig |
|
import torch |
|
|
|
model_name = "Daemontatox/Mini-Hydra" |
|
tokenizer = AutoTokenizer.from_pretrained(model_name) |
|
model = AutoModelForCausalLM.from_pretrained( |
|
model_name, |
|
torch_dtype=torch.float16, |
|
device_map="auto", |
|
trust_remote_code=True |
|
) |
|
|
|
# Custom generation configuration for reasoning tasks |
|
generation_config = GenerationConfig( |
|
temperature=0.1, # Lower temperature for more focused reasoning |
|
top_p=0.9, |
|
top_k=50, |
|
repetition_penalty=1.1, |
|
max_length=1024, |
|
do_sample=True, |
|
pad_token_id=tokenizer.eos_token_id |
|
) |
|
|
|
def reasoning_generate(prompt, system_prompt="Think step by step and provide a clear reasoning process."): |
|
full_prompt = f"{system_prompt}\n\nProblem: {prompt}\n\nSolution:" |
|
inputs = tokenizer.encode(full_prompt, return_tensors="pt") |
|
|
|
with torch.no_grad(): |
|
outputs = model.generate( |
|
inputs, |
|
generation_config=generation_config |
|
) |
|
|
|
response = tokenizer.decode(outputs[0], skip_special_tokens=True) |
|
return response[len(full_prompt):].strip() |
|
|
|
# Example reasoning problem |
|
math_problem = """ |
|
A rectangular garden has a length that is 3 times its width. |
|
If the perimeter is 32 meters, what are the dimensions of the garden? |
|
""" |
|
|
|
solution = reasoning_generate(math_problem) |
|
print(solution) |
|
``` |
|
|
|
### Batch Processing |
|
|
|
```python |
|
def batch_reasoning(prompts, batch_size=4): |
|
results = [] |
|
|
|
for i in range(0, len(prompts), batch_size): |
|
batch_prompts = prompts[i:i+batch_size] |
|
batch_inputs = tokenizer( |
|
batch_prompts, |
|
return_tensors="pt", |
|
padding=True, |
|
truncation=True |
|
) |
|
|
|
with torch.no_grad(): |
|
batch_outputs = model.generate( |
|
**batch_inputs, |
|
max_length=512, |
|
temperature=0.7, |
|
do_sample=True, |
|
pad_token_id=tokenizer.eos_token_id |
|
) |
|
|
|
batch_responses = tokenizer.batch_decode(batch_outputs, skip_special_tokens=True) |
|
results.extend(batch_responses) |
|
|
|
return results |
|
|
|
# Example batch processing |
|
problems = [ |
|
"What is 15% of 240?", |
|
"If x + 5 = 12, what is x?", |
|
"A circle has radius 7. What is its area?", |
|
"Solve: 2x + 3 = 15" |
|
] |
|
|
|
solutions = batch_reasoning(problems) |
|
for problem, solution in zip(problems, solutions): |
|
print(f"Problem: {problem}") |
|
print(f"Solution: {solution}") |
|
print("-" * 50) |
|
``` |
|
|
|
## Example Outputs |
|
|
|
### Mathematical Reasoning |
|
|
|
```python |
|
Input: "Solve this step by step: If a train travels 120 miles in 2 hours, and then 180 miles in 3 hours, what is the average speed for the entire journey?" |
|
|
|
Output: "Let me solve this step by step: |
|
|
|
Step 1: Calculate total distance |
|
Total distance = 120 miles + 180 miles = 300 miles |
|
|
|
Step 2: Calculate total time |
|
Total time = 2 hours + 3 hours = 5 hours |
|
|
|
Step 3: Calculate average speed |
|
Average speed = Total distance ÷ Total time |
|
Average speed = 300 miles ÷ 5 hours = 60 miles per hour |
|
|
|
Therefore, the average speed for the entire journey is 60 mph." |
|
``` |
|
|
|
### Logical Reasoning |
|
|
|
```python |
|
Input: "If all roses are flowers, and some flowers are red, can we conclude that some roses are red?" |
|
|
|
Output: "Let me analyze this logical statement: |
|
|
|
Given premises: |
|
1. All roses are flowers |
|
2. Some flowers are red |
|
|
|
The question asks if we can conclude that some roses are red. |
|
|
|
Analysis: |
|
- From premise 1: roses ⊆ flowers (roses are a subset of flowers) |
|
- From premise 2: ∃ flowers that are red |
|
|
|
However, we cannot definitively conclude that some roses are red because: |
|
- The red flowers mentioned in premise 2 might be flowers other than roses |
|
- We have no information directly connecting roses to the property of being red |
|
|
|
Conclusion: No, we cannot logically conclude that some roses are red based solely on the given premises." |
|
``` |
|
|
|
### Code Generation with Reasoning |
|
|
|
```python |
|
Input: "Write a Python function to find the factorial of a number, and explain your approach." |
|
|
|
Output: "Here's a Python function to calculate factorial with explanation: |
|
|
|
|
|
def factorial(n): |
|
""" |
|
Calculate the factorial of a non-negative integer n. |
|
|
|
Args: |
|
n (int): Non-negative integer |
|
|
|
Returns: |
|
int: n! (factorial of n) |
|
|
|
Raises: |
|
ValueError: If n is negative |
|
""" |
|
if n < 0: |
|
raise ValueError("Factorial is not defined for negative numbers") |
|
elif n == 0 or n == 1: |
|
return 1 |
|
else: |
|
result = 1 |
|
for i in range(2, n + 1): |
|
result *= i |
|
return result |
|
``` |
|
|
|
## Reasoning behind the approach: |
|
1. Input validation: Check for negative numbers since factorial is undefined for them |
|
2. Base cases: 0! = 1 and 1! = 1 by mathematical definition |
|
3. Iterative calculation: For n > 1, multiply all integers from 2 to n |
|
4. This iterative approach is more memory-efficient than recursion for large numbers |
|
|
|
## Example usage: |
|
```python |
|
print(factorial(5)) # Output: 120 |
|
print(factorial(0)) # Output: 1 |
|
``` |
|
|
|
## Model Card Authors |
|
|
|
**Primary Author:** Daemontatox |
|
|
|
## Model Card Contact |
|
|
|
For questions, issues, or collaboration opportunities, please contact through the Hugging Face model repository. |
|
|
|
## Citation |
|
|
|
```bibtex |
|
@misc{mini-hydra-2024, |
|
title={Mini-Hydra: Efficient Reasoning with Mixture-of-Experts}, |
|
author={Daemontatox}, |
|
year={2024}, |
|
publisher={Hugging Face}, |
|
howpublished={\\url{https://huggingface.co/Daemontatox/Mini-Hydra}}, |
|
note={Based on Qwen3-30B-A3B architecture} |
|
} |
|
``` |
|
|
|
--- |
|
|
|
*This model card follows the guidelines established by the Hugging Face Model Card framework and includes technical details, usage examples, and important limitations to ensure responsible use of the model.* |