metadata
library_name: transformers
license: mit
datasets:
- allenai/dolma
language:
- en
Model Card for FANformer-1B
Model Description
- Model Name: FANformer-1B
- Non-embedding Parameters: 1.1B
- Training Tokens: 1 trillion
- Release Date: March 2025
- Model Type: Decoder-only LLM with enhanced periodicity modeling
- License: MIT License
- Repository: GitHub
- Paper: arXiv:2502.21309
FANformer-1B is a 1.1-billion-parameter autoregressive language model pre-trained from scratch to enhance language modeling through effective periodicity mechanisms. Its revised architecture (olmo/model.py) introduces the FAN Layer, a novel component designed to capture periodic patterns in training data, enabling superior learning efficiency and performance.
Training Details
- Hardware: 80 A100 40G GPUs
- Training Data: Subset of Dolma Dataset (OLMo-1B’s training corpus)
- Maximum Context Length: 2,048 tokens
Intended Uses
- Primary Use: General-purpose text generation and understanding.
- Downstream Use: Can be fine-tuned for tasks like summarization, question answering, and dialogue.
- Limitations: May inherit biases from training data. Performance on low-resource languages is not guaranteed.
How to Use
Inference Example:
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("dongyh/FANformer-1B", trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained("dongyh/FANformer-1B", trust_remote_code=True)
input_text = "The concept of periodicity serves as a fundamental organizing principle across the natural world, human societies, and even abstract systems. From the rhythmic cycles of celestial bodies governing seasons and tides to the biological clocks regulating sleep and metabolism in living organisms, recurring patterns create stability amid chaos. In ecosystems, predator-prey population oscillations maintain balance, while the carbon cycle ensures Earth's climate resilience. Culturally, humanity has structured civilizations around agricultural cycles, religious calendars, and economic fluctuations—harvest festivals marking seasonal abundance, financial markets swaying between boom and bust. Even at the quantum level, wave functions reveal inherent periodicity that underpins material reality. This universal recurrence enables prediction, adaptation, and innovation: by recognizing cycles, we"
inputs = tokenizer(input_text, return_tensors="pt")
outputs = model.generate(inputs.input_ids, max_length=512, do_sample=True, temperature=0.6, top_p=0.8)
print(tokenizer.decode(outputs[0]))
Evaluation
Standard Benchmarks | Llama-3.2-1B | TinyLLaMA-v1.1 (3T) | MobiLLaMA-1B (1.3T) | OLMo-1B (2T) | OpenELM-1_1B (1.8T) | OLMo-1B-0724 (3T) | AMD-OLMo-1B (1.3T) | FANformer-1B (1T) |
---|---|---|---|---|---|---|---|---|
arc_easy | 56.84 | 55.47 | 56.65 | 57.28 | 55.43 | 56.65 | 63.64 | 72.456 |
arc_challenge | 38.13 | 32.68 | 32.00 | 31.06 | 32.34 | 32.34 | 33.70 | 43.813 |
hellaswag | 64.00 | 61.47 | 61.80 | 62.92 | 64.81 | 66.12 | 63.61 | 64.758 |
piqa | 73.80 | 73.56 | 75.30 | 75.14 | 75.57 | 75.08 | 75.57 | 75.547 |
boolq | 64.30 | 55.99 | 60.83 | 61.74 | 63.58 | 66.18 | 60.58 | 64.924 |
sciq | 92.30 | 89.30 | 88.20 | 87.00 | 90.60 | 92.70 | 93.20 | 94.80 |
winogrande | 61.20 | 59.43 | 59.27 | 59.98 | 61.72 | 61.72 | 61.64 | 61.80 |
openbookqa | 46.00 | 36.80 | 35.40 | 36.20 | 36.20 | 35.60 | 35.80 | 48.20 |
gsm8k | 6.83 | 1.82 | 0.00 | 2.50 | 2.81 | 8.95 | 2.88 | 15.74 |
Average | 55.93 | 51.84 | 52.16 | 52.65 | 53.67 | 55.04 | 54.51 | 60.23 |
Citation
@article{dong2025fanformer,
title={FANformer: Improving Large Language Models Through Effective Periodicity Modeling},
author={Dong, Yihong and Li, Ge and Jiang, Xue and Tao, Yongding and Zhang, Kechi and Zhu, Hao and Liu, Huanyu and Ding, Jiazheng and Li, Jia and Deng, Jinliang and Mei, Hong},
journal={arXiv preprint arXiv:2502.21309},
year={2025}
}
@article{dong2024fan,
title={FAN: Fourier Analysis Networks},
author={Yihong Dong and Ge Li and Yongding Tao and Xue Jiang and Kechi Zhang and Jia Li and Jing Su and Jun Zhang and Jingjing Xu},
journal={arXiv preprint arXiv:2410.02675},
year={2024}
}