·
AI & ML interests
None yet
Organizations
akhauriyash/DDR1_Q1.5B-GRPO-DACD
Updated
akhauriyash/DDR1_Q1.5B-DAPO
akhauriyash/DDR1_Q1.5B-GRPO-CompMath-DummyReward
akhauriyash/DDR1_Q1.5B-GRPO-CompMath
akhauriyash/DDR1_Q1.5B-GRPOFixReward
akhauriyash/DeepSeek-R1-Distill-Qwen-1.5B-E2EGRPO-OpenR1_Math_SpecR_GRPO_Mini-MiniSet
2B • Updated • 2
akhauriyash/RLM-GemmaS-Code-Amoeba-v0
0.2B • Updated • 2
akhauriyash/RLM-GemmaS-Code-PNAS-v0
0.2B • Updated • 2
akhauriyash/RLM-GemmaS-Code-DARTS-v0
0.2B • Updated • 2
• 1
akhauriyash/RLM-GemmaS-Code-v0
0.2B • Updated • 312
• 3
akhauriyash/RegressLM-gemma-s-RLM-table3
0.2B • Updated • 1
akhauriyash/E2EGRPO_bm14B_32Gen_8GAcc_2K_2xAccF
2B • Updated • 1
akhauriyash/E2EGRPO_bm14B_36Gen_2K_2xAccF
Updated
akhauriyash/E2EGRPO_bm14B_128Gen_2K_2xAccF
Updated
akhauriyash/E2EGRPO_bm14B_128Gen_2K_2xAcc
Updated
akhauriyash/E2EGRPO_bm14B_64Gen_4K_2xAcc
Updated
akhauriyash/DeepSeek-R1-Distill-Qwen-1.5B-E2EGRPO-OpenR1_Math_SpecR_GRPO_Mini-MiniSet_14BDrafter
2B • Updated • 1
akhauriyash/DeepSeek-R1-Distill-Qwen-1.5B-E2EGRPO-OpenR1_Math_SpecR_GRPO_Mini-MiniSet_32BDrafter
Updated
akhauriyash/DeepSeek-R1-Distill-Qwen-1.5B-E2EGRPO-OpenR1-220K
2B • Updated • 2
akhauriyash/DeepSeek-R1-Distill-Qwen-1.5B-GRPO-SpeculativeReasoner_Mini
Text Generation
• 2B • Updated • 4
akhauriyash/Llama-3.2-1B-Butler
Text Generation
• 1B • Updated • 13
akhauriyash/Llama-2-7b-hf-Butler
Text Generation
• 7B • Updated • 8
akhauriyash/Llama-3.1-8B-Butler
Text Generation
• 8B • Updated • 7
akhauriyash/DeepSeek-R1-Distill-Qwen-1.5B-GRPO-SplitReasoner
Text Generation
• 2B • Updated • 8
akhauriyash/DeepSeek-R1-Distill-Qwen-1.5B-GRPO-SpeculativeReasoner
Text Generation
• 2B • Updated • 8
• 1
akhauriyash/DeepSeek-R1-Distill-Qwen-1.5B-SpeculativeReasoner
Text Generation
• 2B • Updated • 53
• akhauriyash/DeepSeek-R1-Distill-Qwen-1.5B-SelfCompress_SFT_GRPO_INDUCETEST
2B • Updated • 1
akhauriyash/DeepSeek-R1-Distill-Qwen-1.5B-SpecReason_SFT_GRPO_14k
2B • Updated • 3
akhauriyash/DeepSeek-R1-Distill-Qwen-1.5B-SelfCompress_SFT
Text Generation
• 2B • Updated • 4
akhauriyash/DeepSeek-R1-Distill-Qwen-1.5B-SpecReasoner_SFT_GRPO_14k_v4
Updated