reproducing DeepSeek R1 Zero with Qwen2.5-0.5B on two 4090 GPUs
rasdani
rasdani
AI & ML interests
None yet
Recent Activity
updated
a dataset
about 2 hours ago
rasdani/github-patches-decontaminated
updated
a dataset
about 4 hours ago
rasdani/github-patches
published
a dataset
about 15 hours ago
rasdani/github-patches-decontaminated
Organizations
Collections
1
Papers
1
models
23

rasdani/qwen3_0_6b_function_rm
Updated
•
23

rasdani/Qwen2.5-0.5B-simpleRL-Zoo-8192k
Updated
•
12

rasdani/Qwen2.5-0.5B-simpleRL-Zoo
Text Generation
•
Updated
•
9

rasdani/smolR1-Qwen2.5-0.5B
Text Generation
•
Updated
•
11

rasdani/Qwen2.5-0.5B-simpleRL-Zoo-no-KL
Updated

rasdani/Qwen2.5-0.5B-simpleRL-Zoo-3072k
Updated

rasdani/Qwen2.5-0.5B-simpleRL-Zoo-4096k
Updated

rasdani/Qwen2.5-0.5B-simpleRL-Zoo-2560k
Updated

rasdani/Qwen2.5-0.5B-simpleRL-Zoo-2048k
Updated

rasdani/Qwen2.5-0.5B-simpleRL-Zoo-first-try
Updated
•
11
datasets
109
rasdani/github-patches-decontaminated
Viewer
•
Updated
•
67.5k
•
5
rasdani/github-patches
Viewer
•
Updated
•
69.7k
•
107
rasdani/github-patches-debug-genesys
Viewer
•
Updated
•
1.64k
•
84
rasdani/github-patches-debug
Viewer
•
Updated
•
1.64k
•
122
rasdani/github-patches-10k-sample-sorted
Viewer
•
Updated
•
1.64k
•
146
rasdani/reward-bench-2-if
Viewer
•
Updated
•
160
•
105
rasdani/swe-fixer-20k-sample-sorted
Viewer
•
Updated
•
2k
•
116
rasdani/swe-fixer-10k-sample-sorted
Viewer
•
Updated
•
2k
•
120
rasdani/swe-fixer-4k-token-limit-sorted-2k
Viewer
•
Updated
•
2k
•
125
rasdani/swe-fixer-4k-token-limit-sorted
Viewer
•
Updated
•
54.3k
•
149