Loggenix MoE 0.1B - Early Stopped
This is a Mixture of Experts (MoE) model based on Qwen3 architecture, trained on coding data.
Model Details
- Base Model: Qwen/Qwen3-0.6B
- Architecture: Qwen3MoE
- Parameters: ~80M parameters
- Experts: 4 experts, 2 active per token
- Context Length: 512 tokens
- Training Status: Early stopped due to convergence
Model Configuration
- Hidden Size: 512
- Attention Heads: 8
- Hidden Layers: 12
- Vocabulary Size: 151,936
Usage
from transformers import AutoTokenizer, Qwen3MoeForCausalLM
tokenizer = AutoTokenizer.from_pretrained("kshitijthakkar/loggenix-moe-0.1B-e2-lr5e5-b4-3060-early-stopped")
model = Qwen3MoeForCausalLM.from_pretrained("kshitijthakkar/loggenix-moe-0.1B-e2-lr5e5-b4-3060-early-stopped")
# Example usage
input_text = "def factorial(n):"
inputs = tokenizer(input_text, return_tensors="pt")
outputs = model.generate(inputs.input_ids, max_new_tokens=50)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Training Details
- Dataset: Custom coding dataset
- Training stopped early due to validation loss convergence
- Best model checkpoint saved and uploaded
- Downloads last month
- 1
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support