ermiaazarkhalili
/

mistral-7b-instruct-v0.3-grpo-GSM8K

Text Generation

text-generation-inference

Model card Files Files and versions

Uploaded finetuned model

Developed by: ermiaazarkhalili
License: apache-2.0
Finetuned from model : unsloth/mistral-7b-instruct-v0.3-bnb-4bit

This qwen2 model was trained 2x faster with Unsloth and Huggingface's TRL library.

Downloads last month: 10

Safetensors

Model size

7.25B params

Tensor type

BF16

·

Inference Providers NEW

Text Generation

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ermiaazarkhalili/mistral-7b-instruct-v0.3-grpo-GSM8K

Base model

unsloth/mistral-7b-instruct-v0.3

Finetuned

(123)

this model

Dataset used to train ermiaazarkhalili/mistral-7b-instruct-v0.3-grpo-GSM8K

Collection including ermiaazarkhalili/mistral-7b-instruct-v0.3-grpo-GSM8K

Mistral-GRPO-GSM8K

2 items • Updated 8 days ago