LLM Models - a Stalin16 Collection

Stalin16 's Collections

Model Evaluation

Reasoning Models

Data and other things

Gen AI Diffusion

LLM Models

updated 10 days ago

gradientai/Llama-3-8B-Instruct-Gradient-1048k

Text Generation • Updated Oct 29, 2024 • 26.7k • 679
Are Your LLMs Capable of Stable Reasoning?

Paper • 2412.13147 • Published Dec 17, 2024 • 95
RetroLLM: Empowering Large Language Models to Retrieve Fine-grained Evidence within Generation

Paper • 2412.11919 • Published Dec 16, 2024 • 37
HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs

Paper • 2412.18925 • Published Dec 25, 2024 • 101
Virgo: A Preliminary Exploration on Reproducing o1-like MLLM

Paper • 2501.01904 • Published Jan 3 • 34
VideoRAG: Retrieval-Augmented Generation over Video Corpus

Paper • 2501.05874 • Published Jan 10 • 72
Baichuan-Omni-1.5 Technical Report

Paper • 2501.15368 • Published Jan 26 • 64
InfiniteHiP: Extending Language Model Context Up to 3 Million Tokens on a Single GPU

Paper • 2502.08910 • Published Feb 13 • 149
SpargeAttn: Accurate Sparse Attention Accelerating Any Model Inference

Paper • 2502.18137 • Published Feb 25 • 57
Slamming: Training a Speech Language Model on One GPU in a Day

Paper • 2502.15814 • Published Feb 19 • 69
Transformers without Normalization

Paper • 2503.10622 • Published Mar 13 • 160
Chapter-Llama: Efficient Chaptering in Hour-Long Videos with LLMs

Paper • 2504.00072 • Published 26 days ago • 7
ReZero: Enhancing LLM search ability by trying one-more-time

Paper • 2504.11001 • Published 11 days ago • 14