-
WPO: Enhancing RLHF with Weighted Preference Optimization
Paper • 2406.11827 • Published • 15 -
Self-Improving Robust Preference Optimization
Paper • 2406.01660 • Published • 20 -
Bootstrapping Language Models with DPO Implicit Rewards
Paper • 2406.09760 • Published • 41 -
BPO: Supercharging Online Preference Learning by Adhering to the Proximity of Behavior LLM
Paper • 2406.12168 • Published • 7
Park
sh110495
AI & ML interests
None yet
Recent Activity
updated
a collection
about 1 month ago
Long Context
upvoted
a
paper
about 1 month ago
A Controllable Examination for Long-Context Language Models
upvoted
a
paper
2 months ago
Trillion 7B Technical Report
Organizations
Interested
-
Large Language Model Unlearning via Embedding-Corrupted Prompts
Paper • 2406.07933 • Published • 9 -
Block Transformer: Global-to-Local Language Modeling for Fast Inference
Paper • 2406.02657 • Published • 42 -
Learn Beyond The Answer: Training Language Models with Reflection for Mathematical Reasoning
Paper • 2406.12050 • Published • 19 -
How Do Large Language Models Acquire Factual Knowledge During Pretraining?
Paper • 2406.11813 • Published • 32
Data Selection
-
Instruction Mining: High-Quality Instruction Data Selection for Large Language Models
Paper • 2307.06290 • Published • 10 -
Long Context is Not Long at All: A Prospector of Long-Dependency Data for Large Language Models
Paper • 2405.17915 • Published • 2 -
Surveying the Effects of Quality, Diversity, and Complexity in Synthetic Data From Large Language Models
Paper • 2412.02980 • Published • 15
Long Context
-
LLoCO: Learning Long Contexts Offline
Paper • 2404.07979 • Published • 23 -
LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens
Paper • 2402.13753 • Published • 117 -
LongAgent: Scaling Language Models to 128k Context through Multi-Agent Collaboration
Paper • 2402.11550 • Published • 18 -
LongAlign: A Recipe for Long Context Alignment of Large Language Models
Paper • 2401.18058 • Published • 23
Evaluation
RL
-
WPO: Enhancing RLHF with Weighted Preference Optimization
Paper • 2406.11827 • Published • 15 -
Self-Improving Robust Preference Optimization
Paper • 2406.01660 • Published • 20 -
Bootstrapping Language Models with DPO Implicit Rewards
Paper • 2406.09760 • Published • 41 -
BPO: Supercharging Online Preference Learning by Adhering to the Proximity of Behavior LLM
Paper • 2406.12168 • Published • 7
Long Context
-
LLoCO: Learning Long Contexts Offline
Paper • 2404.07979 • Published • 23 -
LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens
Paper • 2402.13753 • Published • 117 -
LongAgent: Scaling Language Models to 128k Context through Multi-Agent Collaboration
Paper • 2402.11550 • Published • 18 -
LongAlign: A Recipe for Long Context Alignment of Large Language Models
Paper • 2401.18058 • Published • 23
Interested
-
Large Language Model Unlearning via Embedding-Corrupted Prompts
Paper • 2406.07933 • Published • 9 -
Block Transformer: Global-to-Local Language Modeling for Fast Inference
Paper • 2406.02657 • Published • 42 -
Learn Beyond The Answer: Training Language Models with Reflection for Mathematical Reasoning
Paper • 2406.12050 • Published • 19 -
How Do Large Language Models Acquire Factual Knowledge During Pretraining?
Paper • 2406.11813 • Published • 32
Evaluation
Data Selection
-
Instruction Mining: High-Quality Instruction Data Selection for Large Language Models
Paper • 2307.06290 • Published • 10 -
Long Context is Not Long at All: A Prospector of Long-Dependency Data for Large Language Models
Paper • 2405.17915 • Published • 2 -
Surveying the Effects of Quality, Diversity, and Complexity in Synthetic Data From Large Language Models
Paper • 2412.02980 • Published • 15