Model Card for Model Shriharsh/qwen3-0.6b-creative-writing

This model is a fine-tuned version of Qwen3-0.6B, designed for creative writing tasks such as generating stories and dialogues.

Model Details

Model Description

This model is a fine-tuned version of Qwen/Qwen3-0.6B, optimized for creative writing tasks such as storytelling and dialogue generation. It was trained on the Gryphe/ChatGPT-4o-Writing-Prompts dataset to enhance its narrative capabilities.

Developed by: Shriharsh
Funded by: Self-funded
Shared by: Shriharsh
Model type: Causal Language Model
Language(s) (NLP): English
License: Apache 2.0
Finetuned from model: Qwen/Qwen3-0.6B

Model Sources

Repository: Shriharsh/qwen3-0.6b-creative-writing
Paper: Not available
Demo: Not available

Uses

Direct Use

The model can be used directly to generate creative writing content, such as stories, dialogues, and narrative responses, using the transformers library.

Downstream Use [optional]

It can be further fine-tuned for specific creative writing tasks (e.g., scriptwriting, novel generation) or integrated into applications like chatbots or writing assistants.

Out-of-Scope Use

Not suitable for tasks requiring factual accuracy (e.g., scientific research) or non-English language generation without additional fine-tuning.

Bias, Risks, and Limitations

The model may produce biased or stereotypical narratives due to biases in the training dataset. With a test loss of 3.0295, it might generate incoherent or repetitive text, indicating limitations in generalization.

Recommendations

Users should evaluate outputs for coherence and appropriateness, especially in sensitive contexts. Further fine-tuning or post-processing is recommended to mitigate biases.

How to Get Started with the Model

Use the code below to get started with the model.

  from transformers import AutoModelForCausalLM, AutoTokenizer
  
  model_name = "Shriharsh/qwen3-0.6b-creative-writing"
  tokenizer = AutoTokenizer.from_pretrained(model_name)

  model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype="auto", device_map="auto")

  prompt = "Write a short story about a lost time traveler who has a big pile of cash from the future which is unusable in the current time where he is now."
  messages = [{"role": "user", "content": prompt}]
  
  text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
  inputs = tokenizer(text, return_tensors="pt").to(model.device) # Move inputs to the same device as the model
  
  outputs = model.generate(**inputs, max_new_tokens=2000, temperature=0.7, top_p=0.9)
  print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Training Details

Training Data

The model was trained on the Gryphe/ChatGPT-4o-Writing-Prompts dataset, containing 700 training examples, 150 validation examples, and 150 test examples of prompt-response pairs focused on creative writing.

Training Procedure

Preprocessing [optional]

Conversations were formatted into prompt-response pairs as ### Human: {prompt} \n\n ### Assistant: {response} \n\n and tokenized with a maximum length of 512 tokens.

Training Hyperparameters

Training regime: fp16 mixed precision
Epochs: 3
Learning rate: 2e-4
Batch size: Effective batch size of 4 (per_device_train_batch_size=1, gradient_accumulation_steps=4)
LoRA parameters: r=8, lora_alpha=32, target_modules=["q_proj", "k_proj", "v_proj", "o_proj", "up_proj", "gate_proj", "down_proj"]

Speeds, Sizes, Times [optional]

Training time: ~23 minutes for 525 steps (3 epochs) on a T4 GPU in Google Colab
Model size: 383M parameters

Evaluation

Testing Data, Factors & Metrics

Testing Data

Evaluated on 150 test examples from the Gryphe/ChatGPT-4o-Writing-Prompts dataset.

Factors

Assessed for narrative coherence and creativity, though formal metrics beyond loss were not applied.

Metrics

Loss: Cross-entropy loss

Results

Test Loss: 3.0295 (post-training)
Validation Loss: 3.0295 at step 525
Observations: Slight overfitting observed as validation loss increased from 3.0217 (step 350) to 3.0569 (step 500).

Summary

The model performs adequately for creative writing but shows signs of overfitting and limited generalization.

Environmental Impact

Carbon emissions estimated using the Machine Learning Impact calculator.

Hardware Type: T4 GPU
Hours used: ~0.4 hours (23 minutes)
Cloud Provider: Google Colab
Compute Region: Unknown
Carbon Emitted: ~0.02 kg CO2e (estimated based on T4 GPU usage and Colab’s energy mix)

Technical Specifications [optional]

Model Architecture and Objective

Causal language model with LoRA adapters for efficient fine-tuning, aimed at minimizing cross-entropy loss on creative writing tasks.

Compute Infrastructure

Hardware

T4 GPU with ~15GB VRAM, provided by Google Colab. Thanks Google! 🤗

Software

Transformers: Latest version
PEFT: For LoRA fine-tuning
BitsAndBytes: For 4-bit quantization

Model Card Authors [optional]

Shriharsh

Model Card Contact

Shriharsh (Hugging Face username: Shriharsh)