gupta-tanish/mistral-instruct-v0.2-on-policy-mpo-iteration1 Text Generation • Updated about 7 hours ago
gupta-tanish/llama3-8b-instruct-on-policy-swepo-iteration1 Viewer • Updated about 16 hours ago • 39.5k
gupta-tanish/mistral-instruct-v0.2-on-policy-swepo-iteration1 Viewer • Updated about 19 hours ago • 39.5k
gupta-tanish/Ultrafeedback-llama3-8b-instruct-1vs3-selection-swepo-on-policy-iteration2 Viewer • Updated 14 days ago • 63.1k • 31
gupta-tanish/Ultrafeedback-llama3-8b-Instruct-optimal-selection-1vs7_total_responses_24 Viewer • Updated 15 days ago • 60.8k • 34
gupta-tanish/Ultrafeedback-llama3-8b-Instruct-optimal-selection-1vs7_total_responses_16 Viewer • Updated 15 days ago • 60.8k • 35
gupta-tanish/Ultrafeedback-mistral-7b-instruct-v0.2-1vs3-optimal-selection Viewer • Updated 16 days ago • 62.2k • 33
gupta-tanish/Ultrafeedback-mistral-7b-instruct-1vs3-kmeans-selection Viewer • Updated 16 days ago • 62.2k • 34
gupta-tanish/Ultrafeedback-llama3-8b-instruct-1vs3-optimal-selection Viewer • Updated 17 days ago • 62.2k • 32