Fine-tuning Magistral-Small-2506 in 4-bit Quantization for Medical Reasoning (MCQs)

This project fine-tunes the mistralai/Magistral-Small-2506 model using a medical reasoning dataset (mamachang/medical-reasoning) with 4-bit quantization for memory-efficient training.

🧑‍💻Here is the training notebook: Fine_tuning_Magistral-Small

Notes

GPU Required: Make sure you have access to 1X A100. Get it from RunPod for an hours. Training took only 50 minutes.
Environment: The notebook expects an environment where NVIDIA CUDA drivers are available (nvidia-smi check is included).
Memory Efficiency: 4-bit loading greatly reduces memory footprint.

Usage Script (tested)

from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
from peft import PeftModel
import torch

# Base model
base_model_id = "unsloth/Magistral-Small-2506-bnb-4bit"

# Your fine-tuned LoRA adapter repository
lora_adapter_id = "kingabzpro/Magistral-Small-Medical-QA"

# Load base model
base_model = AutoModelForCausalLM.from_pretrained(
    base_model_id,
    device_map="auto",
    torch_dtype=torch.bfloat16,
    trust_remote_code=True,
)

# Attach the LoRA adapter
model = PeftModel.from_pretrained(
    base_model,
    lora_adapter_id,
    device_map="auto",
    trust_remote_code=True,
)

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(base_model_id, trust_remote_code=True)

# Inference example
prompt = """
Please answer with one of the options in the bracket. Write reasoning in between <analysis></analysis>. Write the answer in between <answer></answer>.

### Question:
A research group wants to assess the relationship between childhood diet and cardiovascular disease in adulthood.
A prospective cohort study of 500 children between 10 to 15 years of age is conducted in which the participants' diets are recorded for 1 year and then the patients are assessed 20 years later for the presence of cardiovascular disease.
A statistically significant association is found between childhood consumption of vegetables and decreased risk of hyperlipidemia and exercise tolerance.
When these findings are submitted to a scientific journal, a peer reviewer comments that the researchers did not discuss the study's validity.
Which of the following additional analyses would most likely address the concerns about this study's design? 
{'A': 'Blinding', 'B': 'Crossover', 'C': 'Matching', 'D': 'Stratification', 'E': 'Randomization'},
### Response:
<analysis>

"""

inputs = tokenizer(
    [prompt + tokenizer.eos_token],
    return_tensors="pt"
).to("cuda")

outputs = model.generate(
    input_ids=inputs.input_ids,
    attention_mask=inputs.attention_mask,
    max_new_tokens=1200,
    eos_token_id=tokenizer.eos_token_id,
    use_cache=True,
)
response = tokenizer.batch_decode(outputs, skip_special_tokens=True)
print(response[0].split("### Response:")[1])

Output:

<analysis>

Analysis:
This is a prospective cohort study looking at the relationship between childhood diet and cardiovascular disease in adulthood. The key concern from the peer reviewer is about the study's validity. 

To address concerns about validity, the researchers could perform additional analyses to control for confounding. Matching and stratification would help control for known confounders like socioeconomic status or family history. Crossover and blinding are not applicable to this observational study design. Randomization would not be possible since the study is observational.
</analysis>
<answer>
D: Stratification
</answer>

kingabzpro
/

Magistral-Small-Medical-QA

Fine-tuning Magistral-Small-2506 in 4-bit Quantization for Medical Reasoning (MCQs)

Notes

Usage Script (tested)

Model tree for kingabzpro/Magistral-Small-Medical-QA

Dataset used to train kingabzpro/Magistral-Small-Medical-QA

Collection including kingabzpro/Magistral-Small-Medical-QA

🎛️Fine-tuning LLMs