1 59 130

momo

wzc991222

AI & ML interests

None yet

Recent Activity

reacted to Kseniase's post with 👍 about 13 hours ago

11 Alignment and Optimization Algorithms for LLMs When we need to align models' behavior with the desired objectives, we rely on specialized algorithms that support helpfulness, accuracy, reasoning, safety, and alignment with user preferences. Much of a model’s usefulness comes from post-training optimization methods. Here are the main optimization algorithms (both classic and new) in one place: 1. PPO (Proximal Policy Optimization) -> https://huggingface.co/papers/1707.06347 Clips the probability ratio to prevent the new policy from diverging too far from the old one. It helps keep everything stable 2. DPO (Direct Preference Optimization) -> https://huggingface.co/papers/2305.18290 It's a non RL method, where an LM is an implicit reward model. It uses a simple loss to boost the preferred answer’s probability over the less preferred one 3. GRPO (Group Relative Policy Optimization) -> https://huggingface.co/papers/2402.03300 An RL method that compares a group of model outputs for the same input and updates the policy based on relative rankings. It doesn't need a separate critic model It's latest application is Flow-GRPO which adds online RL into flow matching models -> https://huggingface.co/papers/2505.05470 4. DAPO (Decoupled Clip and Dynamic sAmpling Policy Optimization) -> https://huggingface.co/papers/2503.14476 Decouples the clipping bounds for flexibility, introducing 4 key techniques: clip-higher (to maintain exploration), dynamic sampling (to ensure gradient updates), token-level loss (to balance learning across long outputs), and overlong reward shaping (to handle long, truncated answers) 5. Supervised Fine-Tuning (SFT) -> https://huggingface.co/papers/2203.02155 Often the first post-pretraining step. A model is fine-tuned on a dataset of high-quality human-written input-output pairs to directly teach desired behaviors More in the comments 👇 If you liked it, also subscribe to the Turing Post: https://www.turingpost.com/subscribe

liked a dataset 3 days ago

HuggingFaceTB/dclm-edu

upvoted a paper 3 days ago

Unified Multimodal Understanding and Generation Models: Advances, Challenges, and Opportunities

View all activity

Organizations

wzc991222's activity

upvoted a paper 3 days ago

Unified Multimodal Understanding and Generation Models: Advances, Challenges, and Opportunities

Paper • 2505.02567 • Published 7 days ago • 64

upvoted 2 papers 6 days ago

A Survey on Inference Engines for Large Language Models: Perspectives on Optimization and Efficiency

Paper • 2505.01658 • Published 9 days ago • 30

Low-Precision Training of Large Language Models: Methods, Challenges, and Opportunities

Paper • 2505.01043 • Published 10 days ago • 9

upvoted 5 papers 17 days ago

Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction

Paper • 2404.02905 • Published Apr 3, 2024 • 71

MiniMax-01: Scaling Foundation Models with Lightning Attention

Paper • 2501.08313 • Published Jan 14 • 290

Kimi k1.5: Scaling Reinforcement Learning with LLMs

Paper • 2501.12599 • Published Jan 22 • 115

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

Paper • 2501.12948 • Published Jan 22 • 390

SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model

Paper • 2502.02737 • Published Feb 4 • 229

upvoted a paper 22 days ago

Sleep-time Compute: Beyond Inference Scaling at Test-time

Paper • 2504.13171 • Published 24 days ago • 15

upvoted an article 25 days ago

Article

Don't repeat yourself - 🤗 Transformers Design Philosophy

Apr 5, 2022

• 30

upvoted a collection 27 days ago

GLM-4-0414

Collection

GLM-4-0414 series model • 8 items • Updated 27 days ago • 125

upvoted a collection 28 days ago

MobileLLM

Collection

Optimizing Sub-billion Parameter Language Models for On-Device Use Cases (ICML 2024) https://arxiv.org/abs/2402.14905 • 9 items • Updated Nov 27, 2024 • 114

upvoted a paper 28 days ago

SmolVLM: Redefining small and efficient multimodal models

Paper • 2504.05299 • Published Apr 7 • 181

upvoted 2 papers about 1 month ago

OmniSVG: A Unified Scalable Vector Graphics Generation Model

Paper • 2504.06263 • Published Apr 8 • 160

Advances and Challenges in Foundation Agents: From Brain-Inspired Intelligence to Evolutionary, Collaborative, and Safe Systems

Paper • 2504.01990 • Published Mar 31 • 276

upvoted 4 papers about 2 months ago