Hugging Face
Models
Datasets
Spaces
Community
Docs
Enterprise
Pricing
Log In
Sign Up
Aarushhh
/
SEWY2-640M-untrained
like
0
Text Generation
Transformers
Safetensors
Sewy_v2
conversational
custom_code
License:
cc-by-nc-sa-4.0
Model card
Files
Files and versions
Community
Train
Use this model
Sewy2 (untrained) 640m
It is a new MoE architecture which uses the following:
Architecture:
Sewy2 (untrained) 640m
It is a new MoE architecture which uses the following:
DeepseekV3
nGPT
ResFormer
NeuTRENO (as in resformer)
Tanh logit softcapping (as in Gemma2)
Architecture:
32 Layers
32 Heads
32 KV heads
64 experts
8 experts per token
Downloads last month
18
Safetensors
Model size
640M params
Tensor type
F32
ยท
Chat template
Files info
Inference Providers
NEW
Text Generation
This model isn't deployed by any Inference Provider.
๐
Ask for provider support