Model Checkpoint Formats
Collection
Versatile model save formats including LoRA, GGUF, and merged weights for deployment
•
1 item
•
Updated
A specialized medical question-answering model built on Mistral-7B and fine-tuned on the FreedomIntelligence/medical-o1-reasoning-SFT dataset.
This model is a LoRA adaptation of Mistral-7B, fine-tuned to provide accurate and informative answers to medical questions. It's optimized using Unsloth for efficient training and inference.
To use this model:
!pip install unsloth
from unsloth import FastLanguageModel
import torch
# Define the Alpaca prompt template
alpaca_prompt = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.
### Instruction:
{instruction}
### Input:
{input_text}
### Response:
{output}"""
# Load your model
model, tokenizer = FastLanguageModel.from_pretrained(
model_name="Subh775/mistral-7b-medical-o1-ft",
max_seq_length=2048,
load_in_4bit=True
)
# Enable optimized inference mode for faster generation
FastLanguageModel.for_inference(model)
# Function to handle the chat loop with memory
def chat():
print("Chat with mistral-7b-medical-o1-ft! Type '\\q' or 'quit' to stop.\n")
chat_history = "" # Store the conversation history
while True:
# Get user input
user_input = input("➤ ")
# Exit condition
if user_input.lower() in ['\\q', 'quit']:
print("\nExiting the chat. Goodbye 🩺👍!")
print("✨" + "=" * 27 + "✨\n")
break
# Append the current input to chat history with instruction formatting
prompt = alpaca_prompt.format(
instruction="Please answer the following medical question.",
input_text=user_input,
output=""
)
chat_history += prompt + "\n"
# Tokenize combined history and move to GPU
inputs = tokenizer([chat_history], return_tensors="pt").to("cuda")
# Generate output with configured parameters
outputs = model.generate(
**inputs,
max_new_tokens=256,
temperature=0.7,
top_p=0.9,
num_return_sequences=1,
do_sample=True,
no_repeat_ngram_size=2
)
# Decode and clean the model's response
decoded_output = tokenizer.batch_decode(outputs, skip_special_tokens=True)
clean_output = decoded_output[0].split('### Response:')[-1].strip()
# Add the response to chat history
chat_history += f": {clean_output}\n"
# Display the response
print(f"\n🧑⚕️: {clean_output}\n")
# Start the chat
chat()
This model was fine-tuned on the FreedomIntelligence/medical-o1-reasoning-SFT dataset, which contains approximately 50,000 high-quality medical question-answer pairs. The training used Unsloth for optimization and LoRA for parameter-efficient fine-tuning.
This model inherits the license from the base Mistral-7B model.
@misc{mistral-7b-medical-o1-ft,
author = {Subh775},
title = {Mistral-7B Medical QA Model},
year = {2025},
publisher = {HuggingFace},
journal = {HuggingFace Repository},
howpublished = {\url{https://huggingface.co/Subh775/mistral-7b-medical-o1-ft}}
}
Base model
unsloth/mistral-7b-bnb-4bit