metadata

language: en
license: mit
library_name: transformers
pipeline_tag: text-generation
tags:
  - text-generation
  - conversational
  - survey-response-generation
  - synthetic-data
  - fine-tuned
  - chatbot

aryashah00/survey-finetuned-tinyllama-for-deployment

Model Description

This model is a fine-tuned version of TinyLlama/TinyLlama-1.1B-Chat-v1.0 optimized for generating synthetic survey responses across multiple domains. It has been instruction-tuned using a custom dataset of survey responses, with each response reflecting a specific persona.

Training Data

Dataset Size: ~3,000 examples
Domains: 10 domains including healthcare, education, etc.
Format: ChatML instruction format with system and user prompts

Training Details

Base Model: TinyLlama/TinyLlama-1.1B-Chat-v1.0

Usage

This model is specifically designed for generating synthetic survey responses from different personas. It works best when provided with:

A detailed persona description
A specific survey question

EXAMPLE Inference on CPU:

import torch
import os
from transformers import AutoTokenizer, LlamaForCausalLM, LlamaConfig

# Force CPU usage by hiding all CUDA devices
os.environ["CUDA_VISIBLE_DEVICES"] = ""

# Use a standard non-quantized TinyLlama model instead of the quantized one
model_name = "aryashah00/survey-finetuned-tinyllama-for-deployment"  # Base model without quantization

# Load tokenizer
print("Loading tokenizer...")
tokenizer = AutoTokenizer.from_pretrained(model_name)

# Load model with explicit CPU configuration
print("Loading model on CPU (this may take a while)...")
model = LlamaForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.float32,
    device_map="cpu"
)

print(f"Model loaded successfully on: {next(model.parameters()).device}")

# Example persona and survey question
persona = "A caring mother who lost her first child due to a miscarriage."
question = "Rate on a scale of 1(less likely) to 5(extremely likely) for the following question: I deeply care about others"

# Format messages following chat template
messages = [
    {"role": "system", "content": f"You are embodying the following persona: {persona}"},
    {"role": "user", "content": f"Survey Question: {question}\n\nPlease provide your honest score on a scale of 1 to 5 and detailed reason for this score to this question."}
]

# Apply chat template
print("Preparing input...")
input_text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
input_ids = tokenizer(input_text, return_tensors="pt").input_ids

# Generate response
print("Generating response...")
with torch.no_grad():
    output_ids = model.generate(
        input_ids=input_ids,
        max_new_tokens=256,
        temperature=0.9,
        top_p=0.9,
        do_sample=True
    )

# Decode and print the response
output = tokenizer.decode(output_ids[0], skip_special_tokens=True)
# Extract the model's answer (remove the prompt)
response_start = output.find(input_text) + len(input_text)
generated_response = output[response_start:].strip()
print("\nGenerated response:\n", generated_response)

Limitations

The model is optimized for survey response generation and may not perform well on other tasks
Response quality depends on the clarity and specificity of the persona and question
The model may occasionally generate responses that don't fully align with the given persona

License

This model follows the license of the base model TinyLlama/TinyLlama-1.1B-Chat-v1.0.