amusktweewt/tiny-model-700M-chat
This is a general-purpose transformer-based language model tailored for conversational tasks, story generation, and code-related interactions. It builds upon earlier models in the "tiny" series with increased model size, improved attention efficiency, and optimized training setup.
It is more than twice as smart as the 500M model, with a significantly better user experience. It knows more facts and is the first model in this series capable of performing basic arithmetic.
Model Details
Model Description
- Model type: LlamaForCausalLM
- Hidden size: 816
- Layers: 26
- Attention heads: 12
- Key/Value heads: 6
- Intermediate size: 9856
- Total Parameters: 706M
- Tokenizer vocab size: 32,768
- Max sequence length: 2048 tokens
- Rotary Positional Encoding: Dynamic (factor: 2.0)
- Activation: SiLU
- Attention Implementation: Flash Attention 2
- Other optimizations:
- Scaled dot-product attention
- Memory-efficient attention
- No bias in MLP or attention layers
Training Details
Training Configuration
- Optimizer: AdamW with 8-bit precision (
adamw_bnb_8bit
) - Learning rate: 8e-5
- Scheduler: Cosine
- Warmup ratio: 15%
- Weight decay: 0.01
- Batch size: 6 (train), 2 (eval) per device
- Gradient accumulation: 2 steps
- Mixed precision: bfloat16
- Epochs: 1
- Training tokens: 43.6B
- Seed: 42
Training Hardware
- Hardware: Assumed similar to 4090-class GPU
- Torch Compile: Enabled (inductor backend)
Evaluation
- Perplexity: 2.177
- Eval loss: 0.7776
In my own custom made benchmark for small models gets the highest grade of all my models
Intelligence Score Comparison
Model | Intelligence Score |
---|---|
Gemma-3-27B (for comparison) | 8.3 |
tiny-model-700M-chat | 4.42841 |
tiny-model-141M-chat (unreleased) | 2.7 |
tiny-model-500M-chat-v2 | 2.50909 |
tiny-model-500M-chat-v2-5-exp | 2.08295 |
Usage and Applications
Direct Use
This model is suitable for:
- Text and dialogue generation
- Educational tasks
- Code completion and explanation
- Story creation
Not Recommended For
- High factual precision tasks
- Sensitive or critical domains without human supervision
How to Get Started
import torch
from transformers import pipeline, set_seed
# Set up the text-generation pipeline
model_name = "amusktweewt/tiny-model-700M-chat"
chatbot = pipeline(
"text-generation",
model=model_name,
device=0 if torch.cuda.is_available() else -1
)
# Ensure that bos_token and eos_token are explicitly set as strings
chatbot.tokenizer.bos_token = "<sos>"
chatbot.tokenizer.eos_token = "<|endoftext|>"
# Set seed for reproducibility (optional)
set_seed(42)
print("Chatbot is ready! Type 'exit' to end the conversation.")
# Initialize the conversation history
conversation_history = []
conversation_history.append({"role": "system", "content": "You are a highly intelligent and helpful AI assistant named Tiny Chat, developed by amusktweewt. Always refer to yourself like that. Your responses should be clear, concise, and accurate. Always prioritize user needs, provide well-structured answers, and maintain a friendly yet professional tone. Adapt to the user's preferences and communication style. When needed, ask clarifying questions to ensure the best response. Be honest about limitations and avoid making assumptions. Keep interactions engaging, informative, and efficient."})
while True:
user_input = input("You: ").strip()
if user_input.lower() == "exit":
print("Exiting chat. Goodbye!")
break
# Append user message to the conversation history
conversation_history.append({"role": "user", "content": user_input})
# Prepare the messages with the conversation history and an empty assistant turn
messages = conversation_history + [{"role": "assistant", "content": ""}]
# Use the tokenizer's apply_chat_template() method to format the prompt.
prompt = chatbot.tokenizer.apply_chat_template(messages, tokenize=False)
# Generate text using the formatted prompt.
response = chatbot(
prompt,
do_sample=True,
max_new_tokens=512,
top_k=50,
temperature=0.6,
num_return_sequences=1,
repetition_penalty=1.1,
pad_token_id=chatbot.tokenizer.eos_token_id,
min_new_tokens=20
)
# The returned 'generated_text' includes the prompt plus the generation.
full_text = response[0]["generated_text"]
# Extract the assistant's response by removing the prompt portion.
bot_response = full_text[len(prompt):].strip()
print(f"Bot: {bot_response}")
Contact
Author: amusktweewt
For issues or feedback, please reach out via Hugging Face profile.
- Downloads last month
- 20
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support