Blog, Articles, and discussions

No GPU left behind: Unlocking Efficiency with Co-located vLLM in TRL

By June 3, 2025 guest • 91

Community Articles

view all

When Does Reasoning Matter? Unpacking the Contribution of Reasoning to LLM Performance

and 1 other •

about 14 hours ago

• 10

Qianfan-VL: A Milestone Achievement in Chinese Multimodal AI with Domestic Chips

•

6 days ago

• 8

Preserving Agency: Why AI Safety Needs Community, Not Corporate Control

•

1 day ago

• 8

Code a simple RAG from scratch

•

Oct 29, 2024

• 205

Nemotron-Personas-Japan: ソブリン AI のための合成データセット

and 6 others •

5 days ago

• 7

Uncensor any LLM with abliteration

•

Jun 13, 2024

• 683

PP-OCRv5 on Hugging Face: A Specialized Approach to OCR

and 5 others •

20 days ago

• 102

Ground-up efforts to build large datasets for effective and accurate translation of Modi-Script documents into modern Marathi

and 1 other •

5 days ago

• 6

Fine-Tuning Your First Large Language Model (LLM) with PyTorch and Hugging Face

•

Feb 11

• 72

How I Trained Action Chunking Transformer (ACT) on SO-101: My Journey, Gotchas, and Lessons

•

about 18 hours ago

• 5

arXiv实用技巧，如何让你的paper关注度变高？

•

Jul 8, 2024

• 14

DeepSeek-R1 Dissection: Understanding PPO & GRPO Without Any Prior Reinforcement Learning Knowledge

•

Feb 7

• 225

Small Language Models (SLM): A Comprehensive Overview

•

Feb 22

• 75

Understanding Gemma 3n: How MatFormer Gives You Many Models in One

•

Jun 26

• 47

PrediBench: Testing AI models on prediction markets

and 1 other •

6 days ago

• 4

Introduction to State Space Models (SSM)

•

Jul 19, 2024

• 175

Preference Optimization for Vision Language Models

By July 10, 2024 • 84

Putting RL back in RLHF

By June 12, 2024 • 101

Constitutional AI with Open LLMs

By February 1, 2024 • 16

Preference Tuning LLMs with Direct Preference Optimization Methods

By January 18, 2024 • 72

The N Implementation Details of RLHF with PPO

By October 24, 2023 • 69

Finetune Stable Diffusion Models with DDPO via TRL

By September 29, 2023 guest • 18

Fine-tune Llama 2 with DPO

By August 8, 2023 • 64

StackLLaMA: A hands-on guide to train LLaMA with RLHF

By April 5, 2023 • 44

Fine-tuning 20B LLMs with RLHF on a 24GB consumer GPU

By March 9, 2023 • 67

Red-Teaming Large Language Models

By February 24, 2023 • 30

What Makes a Dialog Agent Useful?

By January 24, 2023 • 2

Illustrating Reinforcement Learning from Human Feedback (RLHF)

By December 9, 2022 • 350

Community Articles

There is no such thing as a tokenizer-free lunch

•

6 days ago

• 64

Nemotron-Personas-Japan: Synthesized Data for Sovereign AI

and 6 others •

7 days ago

• 22

Model Quality: Hugging Face Is All You Need

•

4 days ago

• 16

RexBERT: Encoders for a brave new world of E-Commerce

and 1 other •

10 days ago

• 46

When Does Reasoning Matter? Unpacking the Contribution of Reasoning to LLM Performance

and 1 other •

about 14 hours ago

• 10

Qianfan-VL: A Milestone Achievement in Chinese Multimodal AI with Domestic Chips

•

6 days ago

• 8

Preserving Agency: Why AI Safety Needs Community, Not Corporate Control

•

1 day ago

• 8

Code a simple RAG from scratch

•

Oct 29, 2024

• 205

Nemotron-Personas-Japan: ソブリン AI のための合成データセット

and 6 others •

5 days ago

• 7

Uncensor any LLM with abliteration

•

Jun 13, 2024

• 683

PP-OCRv5 on Hugging Face: A Specialized Approach to OCR

and 5 others •

20 days ago

• 102

Ground-up efforts to build large datasets for effective and accurate translation of Modi-Script documents into modern Marathi

and 1 other •

5 days ago

• 6

Fine-Tuning Your First Large Language Model (LLM) with PyTorch and Hugging Face

•

Feb 11

• 72

How I Trained Action Chunking Transformer (ACT) on SO-101: My Journey, Gotchas, and Lessons

•

about 18 hours ago

• 5

arXiv实用技巧，如何让你的paper关注度变高？

•

Jul 8, 2024

• 14

DeepSeek-R1 Dissection: Understanding PPO & GRPO Without Any Prior Reinforcement Learning Knowledge

•

Feb 7

• 225

Small Language Models (SLM): A Comprehensive Overview

•

Feb 22

• 75

Understanding Gemma 3n: How MatFormer Gives You Many Models in One

•

Jun 26

• 47

PrediBench: Testing AI models on prediction markets

and 1 other •

6 days ago

• 4

Introduction to State Space Models (SSM)

•

Jul 19, 2024

• 175

View all