--- base_model: QCRI/Fanar-1-9B-Instruct datasets: AI-MO/NuminaMath-TIR library_name: peft model_name: Fanar-0.5B-GRPO-test tags: - generated_from_trainer - trl - grpo - math - reasoning - R1 licence: license license: apache-2.0 language: - ar - en pipeline_tag: text-generation --- # ๐Ÿง  Fanar-Math-R1-GRPO **Fanar-Math-R1-GRPO** is a reasoning-optimized language model built on [`QCRI/Fanar-1-9B-Instruct`](https://huggingface.co/QCRI/Fanar-1-9B-Instruct). This version is fine-tuned using **Group Relative Policy Optimization (GRPO)** from the DeepSeekMath framework on the [`AI-MO/NuminaMath-TIR`](https://huggingface.co/datasets/AI-MO/NuminaMath-TIR) dataset. It is designed for step-by-step mathematical problem-solving with structured reasoning in both English and Arabic. ![image/png](https://cdn-uploads.huggingface.co/production/uploads/628f7a71dd993507cfcbe587/NEcy5S2aYn2ly2filngUp.png) --- ## ๐Ÿš€ Model Highlights - ๐Ÿ” Fine-tuned with **GRPO**, a sample-efficient reinforcement learning method - ๐Ÿงฎ Specializes in **multi-step mathematical reasoning** - ๐Ÿ’ฌ Outputs responses in a structured conversational format using `` and `` tags - ๐Ÿง  Trained using **TRL** (`transformers`, `peft`, and `math_verify`) - ๐Ÿท๏ธ Useful for both instruction-following and math-heavy dialogue generation --- ## ๐Ÿ“ฆ Model Details | Component | Description | |------------------|-----------------------------------------------------------------------------| | **Base Model** | [`QCRI/Fanar-1-9B-Instruct`](https://huggingface.co/QCRI/Fanar-1-9B-Instruct) | | **Fine-Tuning** | GRPO via Hugging Face [TRL](https://github.com/huggingface/trl) | | **Dataset** | [`AI-MO/NuminaMath-TIR`](https://huggingface.co/datasets/AI-MO/NuminaMath-TIR) | | **Format** | ` ... ... ` tagged reasoning structure | | **LoRA** | Enabled (modules: `q_proj`, `v_proj`, rank=8) | | **Epochs** | 1 (lightweight test configuration) | | **Tokenizer** | Same as base model | --- ## ๐Ÿงช Inference Example ```python from transformers import AutoTokenizer, AutoModelForCausalLM import torch import time model_id = "Omartificial-Intelligence-Space/Fanar-Math-R1-GRPO" model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype="auto", device_map="auto") tokenizer = AutoTokenizer.from_pretrained(model_id) def generate_with_reasoning(prompt_text): inputs = tokenizer(prompt_text, return_tensors="pt").to(model.device) start = time.time() with torch.no_grad(): output = model.generate(**inputs, max_length=1024) end = time.time() generated = tokenizer.decode(output[0], skip_special_tokens=True) duration = end - start num_input_tokens = inputs["input_ids"].shape[1] num_generated_tokens = output.shape[1] - num_input_tokens return generated, duration, num_generated_tokens # Example Arabic math problem prompt_text = '''ููŠ ู…ุฏูŠู†ุฉ ูŠุจู„ุบ ุนุฏุฏ ุณูƒุงู†ู‡ุง 1 ู…ู„ูŠูˆู† ู†ุณู…ุฉุŒ ุฅุฐุง ูƒุงู† 60% ู…ู† ุงู„ุณูƒุงู† ุจุงู„ุบูŠู†ุŒ ูˆ40% ู…ู† ุงู„ุจุงู„ุบูŠู† ูŠุนู…ู„ูˆู†ุŒ ููƒู… ุนุฏุฏ ุงู„ุนุงู…ู„ูŠู† ููŠ ุงู„ู…ุฏูŠู†ุฉุŸ''' result, time_taken, tokens = generate_with_reasoning(prompt) print(result) ``` --- ## ๐Ÿ› ๏ธ Training Setup ### Configuration Summary - **learning_rate**: 1e-5 - **epochs**: 1 - **max_completion_length**: 64 - **num_generations**: 4 - **gradient_accumulation_steps**: 16 - **logging_steps**: 10 ### Reward Functions - **accuracy_reward**: validates correctness of the answer using `math_verify` - **format_reward**: checks for proper usage of `` and `` tags ### Libraries & Versions ``` transformers==4.47.1 trl==0.14.0 peft==0.14.0 datasets==2.21.0 math_verify==0.3.3 torch==2.4.1 ``` --- ## ๐Ÿ“š Output Format The model is trained to follow a reasoning-first format: ``` ุฃูˆู„ุงู‹ุŒ ู†ุญุณุจ 60% ู…ู† ู…ู„ูŠูˆู† ู†ุณู…ุฉุŒ ูˆู‡ูˆ 600,000. ุซู… ู†ุญุณุจ 40% ู…ู† ู‡ุฐุง ุงู„ุนุฏุฏุŒ ูˆู‡ูˆ 240,000. 240,000 ``` --- ## ๐Ÿ”ฌ Citations ### GRPO โ€“ DeepSeekMath ```bibtex @article{zhihong2024deepseekmath, title={DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models}, author={Shao, Zhihong and Wang, Peiyi and Zhu, Qihao and Xu, Runxin and Song, Junxiao and Zhang, Mingchuan and Li, Y.K. and Wu, Y. and Guo, Daya}, journal={arXiv preprint arXiv:2402.03300}, year={2024} } ``` ### TRL Library ```bibtex @misc{vonwerra2022trl, title={TRL: Transformer Reinforcement Learning}, author={von Werra, Leandro and Belkada, Younes and Tunstall, Lewis and Beeching, Edward and Thrush, Tristan and Lambert, Nathan and Huang, Shengyi and Rasul, Kashif and Gallouรฉdec, Quentin}, year={2022}, howpublished={\url{https://github.com/huggingface/trl}} } ``` ``` @misc{fanarllm2025, title={Fanar: An Arabic-Centric Multimodal Generative AI Platform}, author={Fanar Team and Ummar Abbas and Mohammad Shahmeer Ahmad and Firoj Alam and Enes Altinisik and Ehsannedin Asgari and Yazan Boshmaf and Sabri Boughorbel and Sanjay Chawla and Shammur Chowdhury and Fahim Dalvi and Kareem Darwish and Nadir Durrani and Mohamed Elfeky and Ahmed Elmagarmid and Mohamed Eltabakh and Masoomali Fatehkia and Anastasios Fragkopoulos and Maram Hasanain and Majd Hawasly and Mus'ab Husaini and Soon-Gyo Jung and Ji Kim Lucas and Walid Magdy and Safa Messaoud and Abubakr Mohamed and Tasnim Mohiuddin and Basel Mousi and Hamdy Mubarak and Ahmad Musleh and Zan Naeem and Mourad Ouzzani and Dorde Popovic and Amin Sadeghi and Husrev Taha Sencar and Mohammed Shinoy and Omar Sinan and Yifan Zhang and Ahmed Ali and Yassine El Kheir and Xiaosong Ma and Chaoyi Ruan}}, year={2025}, url={https://arxiv.org/abs/2501.13944}, } ``` --- ## ๐Ÿ”— Resources - [DeepSeekMath Paper](https://arxiv.org/abs/2402.03300) - [TRL Documentation](https://huggingface.co/docs/trl) - [Open-R1 Project](https://github.com/huggingface/open-r1) --- Happy reasoning! ๐Ÿ”โœจ