SmolLM2 360m for Mental Health

Model Description

IMPORTANT: This model has been deprecated in favor of the V2 release use V2 for purposes other than testing/research.

This is my first fine tune of a model being uploaded to the huggingface 🤗 hub! This model is based on SmolLm2-360M-Instruct from the hugging face team, the model was fully fine tuned locally on a 3050 TI with only 4gb of VRAM. The model has decent knowledge of common mental health topics, ie explaining to the user what anxiety, depression, PTSD, etc are. From my limited testing the model appears to excel at describing common mental health problems from a technical standpoint (such as explaining what depression is defined as from the American Psychiatry Association), and can provide high level advice to the user on how to better their mental health. The model being only 360 million parameters is small enough to run on most devices and uses approximately 700 mb of memory for inference, and is therefore intended for lower powered edge devices including most modern smartphones.

This Model should in no way be used to treat, diagnose, or otherwise prevent mental health disorders, and is simply a demonstration of full fine tuning a small model on a consumer GPU. Be smart 😊

Developed by: Alex Dzurec
Model type: Large Language Model
Language(s) (NLP): English (tested)
License: Apache 2.0
Finetuned from model: HuggingFaceTB/SmolLM2-360M-Instruct

Model Sources

Repository: Github

Uses

Uses Discovered

User mental health learning: Can teach the user symptoms, and definitions regarding standard mental health issues and provide examples
"Advice": Model can give broad (albiet sometimes not great) advice to a user presenting with mental health conditions

Direct Use (Inference)

System Prompt: This model was not trained with a specific system prompt although V2's prompt has shown promise in testing.
- V2 System Prompt: "You are an extremely empathetic and helpful AI assistant named SmolHealth designed to listen to the user and provide insight."
Temperature: 1.1 Greater temperature between 1-1.1 have been found to be better for this model
top_p: 0.9 (have not tested other top_p values)

Use With Transformers 🤗

from transformers import pipeline, AutoModelForCausalLM, AutoTokenizer
import torch

# Load the model and tokenizer
model_path = "C:/Users/dzure/ai_projects/smollm_mental_health/smollm2-mentalhealth-360m-fp16/checkpoint-60"
model = AutoModelForCausalLM.from_pretrained(model_path, device_map="auto", torch_dtype=torch.float16)
tokenizer = AutoTokenizer.from_pretrained(model_path)

# Ensure pad token is set if tokenizer doesn't have one (pipeline might need it)
if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token

# Create a pipeline
# We will format the text *before* sending it to the pipeline's generator call
generator = pipeline("text-generation", model=model, tokenizer=tokenizer)

print("Model loaded and ready for interaction.")

# Define a more specific system prompt for your fine-tuned model
system_prompt_content = "You are an extremely empathetic and helpful AI assistant named SmolHealth designed to listen to the user and provide insight. You may ask follow up questions only before ending your turn."

while True:
    print("\nType 'quit' to leave the conversation.")
    user_input = input("You: ")
    if user_input.lower() == 'quit':
        break

    # 1. Construct the messages list with system and user prompts
    messages = [
        {"role": "system", "content": system_prompt_content},
        {"role": "user", "content": user_input}
    ]

    # 2. Apply the chat template
    # add_generation_prompt=True is crucial to add the cue for the assistant to start responding
    try:
        formatted_prompt = tokenizer.apply_chat_template(
            messages,
            tokenize=False,
            add_generation_prompt=True
        )
    except Exception as e:
        print(f"Error applying chat template: {e}")
        print("Ensure your tokenizer has a chat_template attribute properly configured.")
        continue # Skip this turn if formatting fails

    # 3. Generate a response using the fully formatted prompt
    # Pass generation parameters directly here for more control
    response = generator(
        formatted_prompt,
        max_new_tokens=1024,          # Increased slightly
        num_return_sequences=1,
        return_full_text=False,      # Get only the newly generated text
        do_sample=True,              # Use sampling
        temperature=1.0,             # Adjust for creativity vs. focus
        top_p=0.9,                   # Nucleus sampling
        # repetition_penalty=1.1,    # Optionally try to reduce parroting further
    )

    print("Model:", response[0]['generated_text'].strip())

print("Exiting.")

Use with GGUF

GGUF version of the model

Out-of-Scope Use

The model should not be used to treat mental health disorders, nor should this model be used as a substitute for a licensed professional.

Bias, Risks, and Limitations

Preliminary testing has revealed the model to sometimes output repeating text, or (rarely) attempt to finish the users thought. The more chat turns passed into the pipeline the larger this effect seems to become.

Training Details

Training Data

View the dataset here

*All credit for the dataset belongs to Amod

Training Procedure

Full fine tune of SmolLm2-360m in BF16 precision using the TRL library and Pytorch running on a 3050 ti laptop GPU for 60 steps.

Preprocessing

The following function was used to clean the raw dataset and format the q/a into the chat template SmolLm2 expects:

def format_example(data):
    prompt = data["Context"].strip()
    response = data["Response"].strip()

    formatted = tokenizer.apply_chat_template(
        [{"role": "user", "content": prompt}, {"role": "assistant", "content": response}],
        tokenize=False,
        add_generation_prompt=False # Important for training
    )
    return formatted

Training Hyperparameters

Training regime:

training_args = TrainingArguments(
    per_device_train_batch_size=2,
    gradient_accumulation_steps=4,
    warmup_steps=5,
    max_steps=60,
    learning_rate=2e-4,
    fp16=not use_bf16,
    bf16=use_bf16,
    logging_steps=1,
    optim="adamw_8bit",
    weight_decay=0.01,
    lr_scheduler_type="linear",
    seed=3407,
    output_dir="smollm2-mentalhealth-360m-fp16", # IMPORTANT: model save directory
)

Environmental Impact

Hardware Type: 3050 TI mobile GPU
Hours used: 1.5 hours
Carbon Emitted: ~121 g of CO2

More Information

This model was primarily created as my first step towards fine tuning small LLMs capable of running on mobile devices, and proving (some) viabliity of local finetuning.

Model Card Authors

Alex Dzurec

Credit

If you use this model please credit me by name (Alex Dzurec) or by my HuggingFace 🤗 username (dzur658)

dzur658
/

smollm2-mentalhealth-360m