-
ShortGPT: Layers in Large Language Models are More Redundant Than You Expect
Paper • 2403.03853 • Published • 66 -
SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot
Paper • 2301.00774 • Published • 3 -
The LLM Surgeon
Paper • 2312.17244 • Published • 9 -
SliceGPT: Compress Large Language Models by Deleting Rows and Columns
Paper • 2401.15024 • Published • 75
Bui Van Hop
hllj
AI & ML interests
Computer Vision, Deep Learning, NLP
Recent Activity
liked
a model
15 days ago
Qwen/Qwen3-30B-A3B-GPTQ-Int4
liked
a Space
15 days ago
Qwen/Qwen3-Demo
liked
a dataset
about 1 month ago
Skylion007/openwebtext
Organizations
Quantization
(Continued) Pretraining
-
Adapting Large Language Models via Reading Comprehension
Paper • 2309.09530 • Published • 80 -
Gemma: Open Models Based on Gemini Research and Technology
Paper • 2403.08295 • Published • 51 -
Simple and Scalable Strategies to Continually Pre-train Large Language Models
Paper • 2403.08763 • Published • 52 -
DeepSeek LLM: Scaling Open-Source Language Models with Longtermism
Paper • 2401.02954 • Published • 49
Architectures
-
Larimar: Large Language Models with Episodic Memory Control
Paper • 2403.11901 • Published • 34 -
Sparse Upcycling: Training Mixture-of-Experts from Dense Checkpoints
Paper • 2212.05055 • Published • 5 -
Mixture-of-Depths: Dynamically allocating compute in transformer-based language models
Paper • 2404.02258 • Published • 107 -
Multi-Head Mixture-of-Experts
Paper • 2404.15045 • Published • 61
Framework
Dataset Processing Technique
-
CulturaX: A Cleaned, Enormous, and Multilingual Dataset for Large Language Models in 167 Languages
Paper • 2309.09400 • Published • 85 -
ChatGLM-Math: Improving Math Problem-Solving in Large Language Models with a Self-Critique Pipeline
Paper • 2404.02893 • Published • 23 -
Best Practices and Lessons Learned on Synthetic Data for Language Models
Paper • 2404.07503 • Published • 32 -
OpenBezoar: Small, Cost-Effective and Open Models Trained on Mixes of Instruction Data
Paper • 2404.12195 • Published • 12
Vision-Language Model
-
Visual Instruction Tuning
Paper • 2304.08485 • Published • 17 -
Qwen-VL: A Frontier Large Vision-Language Model with Versatile Abilities
Paper • 2308.12966 • Published • 9 -
Improved Baselines with Visual Instruction Tuning
Paper • 2310.03744 • Published • 38 -
SILC: Improving Vision Language Pretraining with Self-Distillation
Paper • 2310.13355 • Published • 9
Speculative Decoding
-
Kangaroo: Lossless Self-Speculative Decoding via Double Early Exiting
Paper • 2404.18911 • Published • 31 -
Accelerating LLM Inference with Staged Speculative Decoding
Paper • 2308.04623 • Published • 25 -
An Emulator for Fine-Tuning Large Language Models using Small Language Models
Paper • 2310.12962 • Published • 13 -
The Curious Case of Neural Text Degeneration
Paper • 1904.09751 • Published • 3
PEFT
-
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection
Paper • 2403.03507 • Published • 189 -
QA-LoRA: Quantization-Aware Low-Rank Adaptation of Large Language Models
Paper • 2309.14717 • Published • 44 -
ReFT: Representation Finetuning for Language Models
Paper • 2404.03592 • Published • 100
Technical Report
-
Yi: Open Foundation Models by 01.AI
Paper • 2403.04652 • Published • 66 -
DeepSeek LLM: Scaling Open-Source Language Models with Longtermism
Paper • 2401.02954 • Published • 49 -
Qwen Technical Report
Paper • 2309.16609 • Published • 37 -
Gemma: Open Models Based on Gemini Research and Technology
Paper • 2403.08295 • Published • 51
RLHF
Retrieval Augmented Generation
Dataset
Insight Paper
Image-Text Models
Code LLMs
Pruning
-
ShortGPT: Layers in Large Language Models are More Redundant Than You Expect
Paper • 2403.03853 • Published • 66 -
SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot
Paper • 2301.00774 • Published • 3 -
The LLM Surgeon
Paper • 2312.17244 • Published • 9 -
SliceGPT: Compress Large Language Models by Deleting Rows and Columns
Paper • 2401.15024 • Published • 75
PEFT
-
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection
Paper • 2403.03507 • Published • 189 -
QA-LoRA: Quantization-Aware Low-Rank Adaptation of Large Language Models
Paper • 2309.14717 • Published • 44 -
ReFT: Representation Finetuning for Language Models
Paper • 2404.03592 • Published • 100
Quantization
Technical Report
-
Yi: Open Foundation Models by 01.AI
Paper • 2403.04652 • Published • 66 -
DeepSeek LLM: Scaling Open-Source Language Models with Longtermism
Paper • 2401.02954 • Published • 49 -
Qwen Technical Report
Paper • 2309.16609 • Published • 37 -
Gemma: Open Models Based on Gemini Research and Technology
Paper • 2403.08295 • Published • 51
(Continued) Pretraining
-
Adapting Large Language Models via Reading Comprehension
Paper • 2309.09530 • Published • 80 -
Gemma: Open Models Based on Gemini Research and Technology
Paper • 2403.08295 • Published • 51 -
Simple and Scalable Strategies to Continually Pre-train Large Language Models
Paper • 2403.08763 • Published • 52 -
DeepSeek LLM: Scaling Open-Source Language Models with Longtermism
Paper • 2401.02954 • Published • 49
RLHF
Architectures
-
Larimar: Large Language Models with Episodic Memory Control
Paper • 2403.11901 • Published • 34 -
Sparse Upcycling: Training Mixture-of-Experts from Dense Checkpoints
Paper • 2212.05055 • Published • 5 -
Mixture-of-Depths: Dynamically allocating compute in transformer-based language models
Paper • 2404.02258 • Published • 107 -
Multi-Head Mixture-of-Experts
Paper • 2404.15045 • Published • 61
Retrieval Augmented Generation
Framework
Dataset
Dataset Processing Technique
-
CulturaX: A Cleaned, Enormous, and Multilingual Dataset for Large Language Models in 167 Languages
Paper • 2309.09400 • Published • 85 -
ChatGLM-Math: Improving Math Problem-Solving in Large Language Models with a Self-Critique Pipeline
Paper • 2404.02893 • Published • 23 -
Best Practices and Lessons Learned on Synthetic Data for Language Models
Paper • 2404.07503 • Published • 32 -
OpenBezoar: Small, Cost-Effective and Open Models Trained on Mixes of Instruction Data
Paper • 2404.12195 • Published • 12
Insight Paper
Vision-Language Model
-
Visual Instruction Tuning
Paper • 2304.08485 • Published • 17 -
Qwen-VL: A Frontier Large Vision-Language Model with Versatile Abilities
Paper • 2308.12966 • Published • 9 -
Improved Baselines with Visual Instruction Tuning
Paper • 2310.03744 • Published • 38 -
SILC: Improving Vision Language Pretraining with Self-Distillation
Paper • 2310.13355 • Published • 9
Image-Text Models
Speculative Decoding
-
Kangaroo: Lossless Self-Speculative Decoding via Double Early Exiting
Paper • 2404.18911 • Published • 31 -
Accelerating LLM Inference with Staged Speculative Decoding
Paper • 2308.04623 • Published • 25 -
An Emulator for Fine-Tuning Large Language Models using Small Language Models
Paper • 2310.12962 • Published • 13 -
The Curious Case of Neural Text Degeneration
Paper • 1904.09751 • Published • 3
Code LLMs