Loggenix MoE 0.1B - Early Stopped

This is a Mixture of Experts (MoE) model based on Qwen3 architecture, trained on coding data.

Model Details

  • Base Model: Qwen/Qwen3-0.6B
  • Architecture: Qwen3MoE
  • Parameters: ~80M parameters
  • Experts: 4 experts, 2 active per token
  • Context Length: 512 tokens
  • Training Status: Early stopped due to convergence

Model Configuration

  • Hidden Size: 512
  • Attention Heads: 8
  • Hidden Layers: 12
  • Vocabulary Size: 151,936

Usage

from transformers import AutoTokenizer, Qwen3MoeForCausalLM

tokenizer = AutoTokenizer.from_pretrained("kshitijthakkar/loggenix-moe-0.1B-e2-lr5e5-b4-3060-early-stopped")
model = Qwen3MoeForCausalLM.from_pretrained("kshitijthakkar/loggenix-moe-0.1B-e2-lr5e5-b4-3060-early-stopped")

# Example usage
input_text = "def factorial(n):"
inputs = tokenizer(input_text, return_tensors="pt")
outputs = model.generate(inputs.input_ids, max_new_tokens=50)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Training Details

  • Dataset: Custom coding dataset
  • Training stopped early due to validation loss convergence
  • Best model checkpoint saved and uploaded
Downloads last month
1
Safetensors
Model size
177M params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for kshitijthakkar/loggenix-moe-0.1B-e2-lr5e5-b4-3060-early-stopped

Finetuned
Qwen/Qwen3-0.6B
Finetuned
(214)
this model