-
Rethinking Optimization and Architecture for Tiny Language Models
Paper • 2402.02791 • Published • 12 -
Specialized Language Models with Cheap Inference from Limited Domain Data
Paper • 2402.01093 • Published • 45 -
Scavenging Hyena: Distilling Transformers into Long Convolution Models
Paper • 2401.17574 • Published • 15 -
Understanding LLMs: A Comprehensive Overview from Training to Inference
Paper • 2401.02038 • Published • 61
Collections
Discover the best community collections!
Collections including paper arxiv:2402.19155
-
Lumiere: A Space-Time Diffusion Model for Video Generation
Paper • 2401.12945 • Published • 86 -
Beyond Language Models: Byte Models are Digital World Simulators
Paper • 2402.19155 • Published • 49 -
AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling
Paper • 2402.12226 • Published • 40 -
DataDreamer: A Tool for Synthetic Data Generation and Reproducible LLM Workflows
Paper • 2402.10379 • Published • 29
-
PRDP: Proximal Reward Difference Prediction for Large-Scale Reward Finetuning of Diffusion Models
Paper • 2402.08714 • Published • 10 -
Data Engineering for Scaling Language Models to 128K Context
Paper • 2402.10171 • Published • 21 -
RLVF: Learning from Verbal Feedback without Overgeneralization
Paper • 2402.10893 • Published • 10 -
Coercing LLMs to do and reveal (almost) anything
Paper • 2402.14020 • Published • 12
-
OpenMathInstruct-1: A 1.8 Million Math Instruction Tuning Dataset
Paper • 2402.10176 • Published • 34 -
Beyond Language Models: Byte Models are Digital World Simulators
Paper • 2402.19155 • Published • 49 -
Matryoshka Representation Learning
Paper • 2205.13147 • Published • 9 -
PixArt-Σ: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation
Paper • 2403.04692 • Published • 40
-
MADLAD-400: A Multilingual And Document-Level Large Audited Dataset
Paper • 2309.04662 • Published • 22 -
Neurons in Large Language Models: Dead, N-gram, Positional
Paper • 2309.04827 • Published • 16 -
Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLMs
Paper • 2309.05516 • Published • 9 -
DrugChat: Towards Enabling ChatGPT-Like Capabilities on Drug Molecule Graphs
Paper • 2309.03907 • Published • 8