masani/SFT_gsm8k_train_size_256_Llama-3.2-1B_epoch_4_global_step_4 Text Generation • 1B • Updated May 13 • 9
masani/SFT_gsm8k_train_size_1024_Llama-3.2-1B_epoch_2_global_step_8 Text Generation • 1B • Updated May 13 • 10
masani/SFT_gsm8k_train_size_512_Llama-3.2-1B_epoch_3_global_step_6 Text Generation • 1B • Updated May 13 • 9
masani/SFT_gsm8k_train_size_256_Llama-3.2-1B_epoch_5_global_step_5 Text Generation • 1B • Updated May 13 • 3
masani/SFT_gsm8k_train_size_4096_Llama-3.2-1B_epoch_1_global_step_16 Text Generation • 1B • Updated May 13 • 9
masani/SFT_gsm8k_train_size_1024_Llama-3.2-1B_epoch_1_global_step_4 Text Generation • 1B • Updated May 13 • 3
masani/SFT_gsm8k_train_size_2048_Llama-3.2-1B_epoch_1_global_step_8 Text Generation • 1B • Updated May 13 • 8
masani/SFT_gsm8k_train_size_512_Llama-3.2-1B_epoch_1_global_step_2 Text Generation • 1B • Updated May 13 • 3
masani/SFT_gsm8k_train_size_256_Llama-3.2-1B_epoch_1_global_step_1 Text Generation • 1B • Updated May 13 • 3
masani/SFT_cumulative_parity_length_16_bitwidth_1_1024_512_Qwen2-1.5B_6000_RL 2B • Updated May 11 • 1
masani/SFT_cumulative_parity_length_16_bitwidth_1_1024_512_Llama-3.2-1B_epoch_3_global_step_12 Text Generation • 1B • Updated May 10 • 872
masani/SFT_cumulative_parity_length_32_bitwidth_1_1024_512_Qwen2-1.5B_epoch_100_global_step_400 Text Generation • 2B • Updated May 2 • 3
masani/SFT_cumulative_parity_length_32_bitwidth_1_4096_512_Qwen2-1.5B_epoch_100_global_step_1600 Text Generation • 2B • Updated May 2 • 3
masani/SFT_cumulative_parity_length_32_bitwidth_1_2048_512_Qwen2-1.5B_epoch_100_global_step_800 Text Generation • 2B • Updated May 2 • 3
masani/SFT_cumulative_parity_length_16_bitwidth_1_2048_512_Qwen2-1.5B_epoch_8_global_step_64 Text Generation • 2B • Updated May 2 • 3