Fine-tuning Magistral-Small-2506 in 4-bit Quantization for Medical Reasoning (MCQs)

This project fine-tunes the mistralai/Magistral-Small-2506 model using a medical reasoning dataset (mamachang/medical-reasoning) with 4-bit quantization for memory-efficient training.


Setup

  1. Install the required libraries:
pip install -U datasets accelerate peft trl bitsandbytes
pip install -U transformers==4.52.1
pip install huggingface_hub
  1. Authenticate with Hugging Face Hub:

Make sure your Hugging Face token is stored in an environment variable:

export HF_TOKEN=your_huggingface_token

The notebook will automatically log you in using this token.


How to Run

  1. Load the Model and Tokenizer
    The script downloads the 4-bit quantizated version of Magistral-Small-2506.

  2. Prepare the Dataset

    • The notebook uses mamachang/medical-reasoning.
    • It formats each example into an instruction-following prompt with step-by-step chain-of-thought reasoning.
  3. Fine-tuning

    • Fine-tuning is set up with PEFT (LoRA / Adapter Tuning style) to modify a small subset of model parameters.
    • TRL (Transformer Reinforcement Learning) is used to fine-tune efficiently.
  4. Push Fine-tuned Model

    • After training, the fine-tuned model and tokenizer are pushed back to your Hugging Face Hub.

🧑‍💻Here is the training notebook: Fine_tuning_Magistral-Small

Model Configuration

  • Base Model: mistralai/Magistral-Small-2506
  • Quantization: 4-bit (NF4)
  • Training: PEFT + TRL
  • Dataset: All examples from medical reasoning dataset

Notes

  • GPU Required: Make sure you have access to 1X A100. Get it from RunPod for an hours. Training took only 50 minutes.
  • Environment: The notebook expects an environment where NVIDIA CUDA drivers are available (nvidia-smi check is included).
  • Memory Efficiency: 4-bit loading greatly reduces memory footprint.

Example Prompt Format

"""
Please answer with one of the options in the bracket. Write reasoning in between <analysis></analysis>. Write the answer in between <answer></answer>.

### Question:
{}

### Response:
{}"""

Usage Script (tested)

from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
from peft import PeftModel
import torch

# Base model
base_model_id = "unsloth/Magistral-Small-2506-bnb-4bit"

# Your fine-tuned LoRA adapter repository
lora_adapter_id = "kingabzpro/Magistral-Small-Medical-QA"

# Load base model
base_model = AutoModelForCausalLM.from_pretrained(
    base_model_id,
    device_map="auto",
    torch_dtype=torch.bfloat16,
    trust_remote_code=True,
)

# Attach the LoRA adapter
model = PeftModel.from_pretrained(
    base_model,
    lora_adapter_id,
    device_map="auto",
    trust_remote_code=True,
)

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(base_model_id, trust_remote_code=True)

# Inference example
prompt = """
Please answer with one of the options in the bracket. Write reasoning in between <analysis></analysis>. Write the answer in between <answer></answer>.

### Question:
A research group wants to assess the relationship between childhood diet and cardiovascular disease in adulthood.
A prospective cohort study of 500 children between 10 to 15 years of age is conducted in which the participants' diets are recorded for 1 year and then the patients are assessed 20 years later for the presence of cardiovascular disease.
A statistically significant association is found between childhood consumption of vegetables and decreased risk of hyperlipidemia and exercise tolerance.
When these findings are submitted to a scientific journal, a peer reviewer comments that the researchers did not discuss the study's validity.
Which of the following additional analyses would most likely address the concerns about this study's design? 
{'A': 'Blinding', 'B': 'Crossover', 'C': 'Matching', 'D': 'Stratification', 'E': 'Randomization'},
### Response:
<analysis>

"""

inputs = tokenizer(
    [prompt + tokenizer.eos_token],
    return_tensors="pt"
).to("cuda")

outputs = model.generate(
    input_ids=inputs.input_ids,
    attention_mask=inputs.attention_mask,
    max_new_tokens=1200,
    eos_token_id=tokenizer.eos_token_id,
    use_cache=True,
)
response = tokenizer.batch_decode(outputs, skip_special_tokens=True)
print(response[0].split("### Response:")[1])

Output:

<analysis>

Analysis:

This is a prospective cohort study looking at the relationship between childhood diet and cardiovascular disease in adulthood. The peer reviewer is concerned about the validity of the study's findings. To address concerns about validity in a prospective cohort study, we need to consider potential confounding factors and selection bias. 

Choice A, blinding, is not relevant since this is an observational study, not a clinical trial. 

Choice B, crossover, is also not applicable since this is a cohort study.

Choice C, matching, could help control for confounding if patients were matched on relevant factors. However, the question does not indicate matching was done.

Choice D, stratification, could help control for confounding by stratifying by key variables. This is a reasonable option.

Choice E, randomization, is the best option. Randomizing patients to different diets would help control for confounding and selection bias. Randomization is the gold standard for controlling confounding in observational studies.
</analysis>
<answer>
E: Randomization
</answer>
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for kingabzpro/Magistral-Small-Medical-QA

Dataset used to train kingabzpro/Magistral-Small-Medical-QA

Collection including kingabzpro/Magistral-Small-Medical-QA