nibauman
/

MPCxR1_Qwen1.5B_SFT_GRPO

Model card Files Files and versions Community

MPCxR1_Qwen1.5B_SFT_GRPO / README.md

nibauman's picture

Upload folder using huggingface_hub

8a459b9 verified about 1 month ago

|

history blame contribute delete

571 Bytes

	---
	base_model: unsloth/qwen2.5-1.5b-instruct-unsloth-bnb-4bit
	library_name: peft
	---

	# Model Card for Model ID

	<!-- Provide a quick summary of what the model is/does. -->
	This is the model that is used to get the paper results for the MPCxR1 Qwen2.5 1.5B SFT GRPO model.
	This model was evaluated on the 19.04.25. and trained on the 18.04.25 at 15:30:47.

	Base model was "nibauman/race_llm_Qwen_1_5B_sft" [here](https://huggingface.co/nibauman/race_llm_Qwen_1_5B_sft)
	This is the wandb train: https://wandb.ai/CoRL-heist-2025/mpc_grpo/runs/4ydp2ilr?nw=nwusernibaumaneth