Spaces:

open-r1
/

README

Running

App Files Files Community

Resources

View closed (2)

[Experiment] Training R1-Zero-like models with Open R1

#20 opened about 2 months ago by

[Experiment] Applying GRPO to DeepSeek-R1-Distill-Qwen-1.5B with LIMO

#15 opened 3 months ago by

Train on a fully open-source base model

#21 opened 28 days ago by

[Paper review] Small Models Struggle to Learn from Strong Reasoners

#19 opened 3 months ago by

Seeking Clarification on GRPO's Core Mechanisms (for Independent Implementation)

#18 opened 3 months ago by

bird-of-paradise

⚠️ Chat template foot gun with DeepSeek distilled models and RL format reward function

#17 opened 3 months ago by

[Experiment]Applying Open-R1-Math-220k to smolThinking models.

#16 opened 3 months ago by

GoogleDeepMind Unstructured-To-JSON Model

#14 opened 4 months ago by

bhaviktheslider

DeepSeek Distilled 32B Responding in Multi Language on English Prompting

#13 opened 4 months ago by

bhaviktheslider

DeepSeek R1 Replication on Qwen 2.5 1.5B for Unstructured to Structured JSON Conversion

#12 opened 4 months ago by

bhaviktheslider

Generating Synthetic questions with "Reverse Question Answering"?

#11 opened 4 months ago by

georgebassemfouad

Multimodal R1

#10 opened 4 months ago by

Replicated R1 Strategy on 8*H100 GPUs - For Qwen-2.5-1.5b

#9 opened 4 months ago by

bhaviktheslider

Unstructured Text to Structured Schema based on rules - Deepseek Distilled 7b Thinking responses

#6 opened 4 months ago by

bhaviktheslider

SmolLm2-135 R1 Distill

#5 opened 4 months ago by

What is the compute needed for GRPO for 7B R1-Distill model?

#4 opened 4 months ago by

Reproducing Deepseek's numbers for MATH-500

#3 opened 4 months ago by

Recommend a dataset in the scientific domain made by us: EricLu/SCP-116K

#2 opened 4 months ago by

LLM Benchmarks and Data Leakage

#1 opened 4 months ago by