Jialiang Cheng
Julius-L
·
AI & ML interests
None yet
Recent Activity
upvoted
an
article
26 days ago
A Review on the Evolvement of Load Balancing Strategy in MoE LLMs: Pitfalls and Lessons
updated
a collection
about 1 month ago
inference acceleration
upvoted
a
collection
3 months ago
🧠Reasoning datasets
Organizations
None yet
Generation
Finetuning
Pretraining
Model Merging
Quantization
Unseen Papers
-
MiniPLM: Knowledge Distillation for Pre-Training Language Models
Paper • 2410.17215 • Published • 16 -
LOGO -- Long cOntext aliGnment via efficient preference Optimization
Paper • 2410.18533 • Published • 44 -
Breaking the Memory Barrier: Near Infinite Batch Size Scaling for Contrastive Loss
Paper • 2410.17243 • Published • 95 -
LongReward: Improving Long-context Large Language Models with AI Feedback
Paper • 2410.21252 • Published • 18
multimodal dataset
-
BigDocs: An Open and Permissively-Licensed Dataset for Training Multimodal Models on Document and Code Tasks
Paper • 2412.04626 • Published • 14 -
GMAI-VL & GMAI-VL-5.5M: A Large Vision-Language Model and A Comprehensive Multimodal Dataset Towards General Medical AI
Paper • 2411.14522 • Published • 39 -
Both Text and Images Leaked! A Systematic Analysis of Multimodal LLM Data Contamination
Paper • 2411.03823 • Published • 50 -
Infinity-MM: Scaling Multimodal Performance with Large-Scale and High-Quality Instruction Data
Paper • 2410.18558 • Published • 20
Long Context
-
Why Does the Effective Context Length of LLMs Fall Short?
Paper • 2410.18745 • Published • 18 -
Language Models can Self-Lengthen to Generate Long Texts
Paper • 2410.23933 • Published • 18 -
ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference
Paper • 2410.21465 • Published • 11
Memory Efficient Training
Model Architecture
-
Differential Transformer
Paper • 2410.05258 • Published • 179 -
Relaxed Recursive Transformers: Effective Parameter Sharing with Layer-wise LoRA
Paper • 2410.20672 • Published • 6 -
TokenFormer: Rethinking Transformer Scaling with Tokenized Model Parameters
Paper • 2410.23168 • Published • 24
Sparsification
LLM Technical Reports
-
The Llama 3 Herd of Models
Paper • 2407.21783 • Published • 117 -
Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution
Paper • 2409.12191 • Published • 78 -
Baichuan Alignment Technical Report
Paper • 2410.14940 • Published • 52 -
A Survey of Small Language Models
Paper • 2410.20011 • Published • 44
inference acceleration
multimodal dataset
-
BigDocs: An Open and Permissively-Licensed Dataset for Training Multimodal Models on Document and Code Tasks
Paper • 2412.04626 • Published • 14 -
GMAI-VL & GMAI-VL-5.5M: A Large Vision-Language Model and A Comprehensive Multimodal Dataset Towards General Medical AI
Paper • 2411.14522 • Published • 39 -
Both Text and Images Leaked! A Systematic Analysis of Multimodal LLM Data Contamination
Paper • 2411.03823 • Published • 50 -
Infinity-MM: Scaling Multimodal Performance with Large-Scale and High-Quality Instruction Data
Paper • 2410.18558 • Published • 20
Generation
Long Context
-
Why Does the Effective Context Length of LLMs Fall Short?
Paper • 2410.18745 • Published • 18 -
Language Models can Self-Lengthen to Generate Long Texts
Paper • 2410.23933 • Published • 18 -
ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference
Paper • 2410.21465 • Published • 11
Finetuning
Memory Efficient Training
Pretraining
Model Architecture
-
Differential Transformer
Paper • 2410.05258 • Published • 179 -
Relaxed Recursive Transformers: Effective Parameter Sharing with Layer-wise LoRA
Paper • 2410.20672 • Published • 6 -
TokenFormer: Rethinking Transformer Scaling with Tokenized Model Parameters
Paper • 2410.23168 • Published • 24
Model Merging
Sparsification
Quantization
LLM Technical Reports
-
The Llama 3 Herd of Models
Paper • 2407.21783 • Published • 117 -
Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution
Paper • 2409.12191 • Published • 78 -
Baichuan Alignment Technical Report
Paper • 2410.14940 • Published • 52 -
A Survey of Small Language Models
Paper • 2410.20011 • Published • 44
Unseen Papers
-
MiniPLM: Knowledge Distillation for Pre-Training Language Models
Paper • 2410.17215 • Published • 16 -
LOGO -- Long cOntext aliGnment via efficient preference Optimization
Paper • 2410.18533 • Published • 44 -
Breaking the Memory Barrier: Near Infinite Batch Size Scaling for Contrastive Loss
Paper • 2410.17243 • Published • 95 -
LongReward: Improving Long-context Large Language Models with AI Feedback
Paper • 2410.21252 • Published • 18