Continual Quantization-Aware Pre-Training: When to transition from 16-bit to 1.58-bit pre-training for BitNet language models? Paper • 2502.11895 • Published Feb 17 • 2
What makes a language easy to deep-learn? Deep neural networks and humans similarly benefit from compositional structure Paper • 2302.12239 • Published Feb 23, 2023 • 1
GenCodeSearchNet: A Benchmark Test Suite for Evaluating Generalization in Programming Language Understanding Paper • 2311.09707 • Published Nov 16, 2023
When are 1.58 bits enough? A Bottom-up Exploration of BitNet Quantization Paper • 2411.05882 • Published Nov 8, 2024 • 1
CBOW Is Not All You Need: Combining CBOW with the Compositional Matrix Space Model Paper • 1902.06423 • Published Feb 18, 2019
Running on CPU Upgrade 13.5k 13.5k Open LLM Leaderboard 🏆 Track, rank and evaluate open LLMs and chatbots
view article Article Finally, a Replacement for BERT: Introducing ModernBERT By bclavie and 14 others • Dec 19, 2024 • 683
KennethEnevoldsen/dfm-sentence-encoder-large Feature Extraction • 0.4B • Updated Nov 27, 2024 • 184 • 2