Model Card for Model Shriharsh/qwen3-0.6b-creative-writing
This model is a fine-tuned version of Qwen3-0.6B, designed for creative writing tasks such as generating stories and dialogues.
Model Details
Model Description
This model is a fine-tuned version of Qwen/Qwen3-0.6B, optimized for creative writing tasks such as storytelling and dialogue generation. It was trained on the Gryphe/ChatGPT-4o-Writing-Prompts dataset to enhance its narrative capabilities.
- Developed by: Shriharsh
- Funded by: Self-funded
- Shared by: Shriharsh
- Model type: Causal Language Model
- Language(s) (NLP): English
- License: Apache 2.0
- Finetuned from model: Qwen/Qwen3-0.6B
Model Sources
- Repository: Shriharsh/qwen3-0.6b-creative-writing
- Paper: Not available
- Demo: Not available
Uses
Direct Use
The model can be used directly to generate creative writing content, such as stories, dialogues, and narrative responses, using the transformers
library.
Downstream Use [optional]
It can be further fine-tuned for specific creative writing tasks (e.g., scriptwriting, novel generation) or integrated into applications like chatbots or writing assistants.
Out-of-Scope Use
Not suitable for tasks requiring factual accuracy (e.g., scientific research) or non-English language generation without additional fine-tuning.
Bias, Risks, and Limitations
The model may produce biased or stereotypical narratives due to biases in the training dataset. With a test loss of 3.0295, it might generate incoherent or repetitive text, indicating limitations in generalization.
Recommendations
Users should evaluate outputs for coherence and appropriateness, especially in sensitive contexts. Further fine-tuning or post-processing is recommended to mitigate biases.
How to Get Started with the Model
Use the code below to get started with the model.
from transformers import AutoModelForCausalLM, AutoTokenizer model_name = "Shriharsh/qwen3-0.6b-creative-writing" tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype="auto", device_map="auto") prompt = "Write a short story about a lost time traveler who has a big pile of cash from the future which is unusable in the current time where he is now." messages = [{"role": "user", "content": prompt}] text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) inputs = tokenizer(text, return_tensors="pt").to(model.device) # Move inputs to the same device as the model outputs = model.generate(**inputs, max_new_tokens=2000, temperature=0.7, top_p=0.9) print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Training Details
Training Data
The model was trained on the Gryphe/ChatGPT-4o-Writing-Prompts dataset, containing 700 training examples, 150 validation examples, and 150 test examples of prompt-response pairs focused on creative writing.
Training Procedure
Preprocessing [optional]
Conversations were formatted into prompt-response pairs as ### Human: {prompt} \n\n ### Assistant: {response} \n\n
and tokenized with a maximum length of 512 tokens.
Training Hyperparameters
- Training regime: fp16 mixed precision
- Epochs: 3
- Learning rate: 2e-4
- Batch size: Effective batch size of 4 (per_device_train_batch_size=1, gradient_accumulation_steps=4)
- LoRA parameters: r=8, lora_alpha=32, target_modules=["q_proj", "k_proj", "v_proj", "o_proj", "up_proj", "gate_proj", "down_proj"]
Speeds, Sizes, Times [optional]
- Training time: ~23 minutes for 525 steps (3 epochs) on a T4 GPU in Google Colab
- Model size: 383M parameters
Evaluation
Testing Data, Factors & Metrics
Testing Data
Evaluated on 150 test examples from the Gryphe/ChatGPT-4o-Writing-Prompts dataset.
Factors
Assessed for narrative coherence and creativity, though formal metrics beyond loss were not applied.
Metrics
- Loss: Cross-entropy loss
Results
- Test Loss: 3.0295 (post-training)
- Validation Loss: 3.0295 at step 525
- Observations: Slight overfitting observed as validation loss increased from 3.0217 (step 350) to 3.0569 (step 500).
Summary
The model performs adequately for creative writing but shows signs of overfitting and limited generalization.
Environmental Impact
Carbon emissions estimated using the Machine Learning Impact calculator.
- Hardware Type: T4 GPU
- Hours used: ~0.4 hours (23 minutes)
- Cloud Provider: Google Colab
- Compute Region: Unknown
- Carbon Emitted: ~0.02 kg CO2e (estimated based on T4 GPU usage and Colab’s energy mix)
Technical Specifications [optional]
Model Architecture and Objective
Causal language model with LoRA adapters for efficient fine-tuning, aimed at minimizing cross-entropy loss on creative writing tasks.
Compute Infrastructure
Hardware
T4 GPU with ~15GB VRAM, provided by Google Colab. Thanks Google! 🤗
Software
- Transformers: Latest version
- PEFT: For LoRA fine-tuning
- BitsAndBytes: For 4-bit quantization
Model Card Authors [optional]
- Shriharsh
Model Card Contact
- Shriharsh (Hugging Face username: Shriharsh)
- Downloads last month
- 5