amusktweewt/tiny-model-700M-chat

This is a general-purpose transformer-based language model tailored for conversational tasks, story generation, and code-related interactions. It builds upon earlier models in the "tiny" series with increased model size, improved attention efficiency, and optimized training setup.

It is more than twice as smart as the 500M model, with a significantly better user experience. It knows more facts and is the first model in this series capable of performing basic arithmetic.

Model Details

Model Description

  • Model type: LlamaForCausalLM
  • Hidden size: 816
  • Layers: 26
  • Attention heads: 12
  • Key/Value heads: 6
  • Intermediate size: 9856
  • Total Parameters: 706M
  • Tokenizer vocab size: 32,768
  • Max sequence length: 2048 tokens
  • Rotary Positional Encoding: Dynamic (factor: 2.0)
  • Activation: SiLU
  • Attention Implementation: Flash Attention 2
  • Other optimizations:
    • Scaled dot-product attention
    • Memory-efficient attention
    • No bias in MLP or attention layers

Training Details

Training Configuration

  • Optimizer: AdamW with 8-bit precision (adamw_bnb_8bit)
  • Learning rate: 8e-5
  • Scheduler: Cosine
  • Warmup ratio: 15%
  • Weight decay: 0.01
  • Batch size: 6 (train), 2 (eval) per device
  • Gradient accumulation: 2 steps
  • Mixed precision: bfloat16
  • Epochs: 1
  • Training tokens: 43.6B
  • Seed: 42

Training Hardware

  • Hardware: Assumed similar to 4090-class GPU
  • Torch Compile: Enabled (inductor backend)

Evaluation

  • Perplexity: 2.177
  • Eval loss: 0.7776

In my own custom made benchmark for small models gets the highest grade of all my models

Intelligence Score Comparison

Model Intelligence Score
Gemma-3-27B (for comparison) 8.3
tiny-model-700M-chat 4.42841
tiny-model-141M-chat (unreleased) 2.7
tiny-model-500M-chat-v2 2.50909
tiny-model-500M-chat-v2-5-exp 2.08295

Usage and Applications

Direct Use

This model is suitable for:

  • Text and dialogue generation
  • Educational tasks
  • Code completion and explanation
  • Story creation

Not Recommended For

  • High factual precision tasks
  • Sensitive or critical domains without human supervision

How to Get Started

import torch
from transformers import pipeline, set_seed

# Set up the text-generation pipeline
model_name = "amusktweewt/tiny-model-700M-chat"
chatbot = pipeline(
    "text-generation",
    model=model_name,
    device=0 if torch.cuda.is_available() else -1
)

# Ensure that bos_token and eos_token are explicitly set as strings
chatbot.tokenizer.bos_token = "<sos>"
chatbot.tokenizer.eos_token = "<|endoftext|>"

# Set seed for reproducibility (optional)
set_seed(42)

print("Chatbot is ready! Type 'exit' to end the conversation.")

# Initialize the conversation history
conversation_history = []

conversation_history.append({"role": "system", "content": "You are a highly intelligent and helpful AI assistant named Tiny Chat, developed by amusktweewt. Always refer to yourself like that. Your responses should be clear, concise, and accurate. Always prioritize user needs, provide well-structured answers, and maintain a friendly yet professional tone. Adapt to the user's preferences and communication style. When needed, ask clarifying questions to ensure the best response. Be honest about limitations and avoid making assumptions. Keep interactions engaging, informative, and efficient."})

while True:
    user_input = input("You: ").strip()
    if user_input.lower() == "exit":
        print("Exiting chat. Goodbye!")
        break

    # Append user message to the conversation history
    conversation_history.append({"role": "user", "content": user_input})

    # Prepare the messages with the conversation history and an empty assistant turn
    messages = conversation_history + [{"role": "assistant", "content": ""}]

    # Use the tokenizer's apply_chat_template() method to format the prompt.
    prompt = chatbot.tokenizer.apply_chat_template(messages, tokenize=False)

    # Generate text using the formatted prompt.
    response = chatbot(
        prompt,
        do_sample=True,
        max_new_tokens=512,
        top_k=50,
        temperature=0.6,
        num_return_sequences=1,
        repetition_penalty=1.1,
        pad_token_id=chatbot.tokenizer.eos_token_id,
        min_new_tokens=20
    )

    # The returned 'generated_text' includes the prompt plus the generation.
    full_text = response[0]["generated_text"]
    # Extract the assistant's response by removing the prompt portion.
    bot_response = full_text[len(prompt):].strip()
    print(f"Bot: {bot_response}")

Contact

Author: amusktweewt

For issues or feedback, please reach out via Hugging Face profile.

Downloads last month
20
Safetensors
Model size
706M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Datasets used to train amusktweewt/tiny-model-700M-chat