Hugging Face
Models
Datasets
Spaces
Community
Docs
Enterprise
Pricing
Log In
Sign Up
dongyh
/
FANformer-1B
like
3
Text Generation
Transformers
Safetensors
allenai/dolma
English
hf_olmo
custom_code
arxiv:
2502.21309
arxiv:
2410.02675
License:
mit
Model card
Files
Files and versions
Community
2
Train
Use this model
New discussion
New pull request
Resources
PR & discussions documentation
Code of Conduct
Hub documentation
All
Discussions
Pull requests
View closed (0)
Sort: Recently created
请问以FAN结构的Tranformer是否存在这种可能性:明标1B实际1B但事实训练的时候,显存或者内存占用等同于3B的占用量?
1
#2 opened about 2 months ago by
abaabbbab
How do you evaluate GSM8K?
#1 opened 2 months ago by
yxli2123