Edit model card

ShuttleAI Thumbnail

πŸ’» Use via API

Shuttle-2.5-mini (beta) [2024/07/27]

We are excited to introduce Shuttle-2.5-mini, our next-generation state-of-the-art language model designed to excel in complex chat, multilingual communication, reasoning, and agent tasks.

  • Shuttle-2.5-mini is a fine-tuned version of Mistral-Nemo-Base-2407, emulating the writing style of Claude 3 models and thoroughly trained on role-playing data.

Model Details

Base Model Architecture

Mistral Nemo is a transformer model with the following architecture choices:

  • Layers: 40
  • Dimension: 5,120
  • Head Dimension: 128
  • Hidden Dimension: 14,436
  • Activation Function: SwiGLU
  • Number of Heads: 32
  • Number of kv-heads: 8 (GQA)
  • Vocabulary Size: 2^17 (approximately 128k)
  • Rotary Embeddings: Theta = 1M

Key Features

  • Released under the Apache 2 License
  • Trained with a 128k context window
  • Pretrained on a large proportion of multilingual and code data
  • Finetuned to emulate the prose quality of Claude 3 models and extensively on role play data

Fine-Tuning Details

  • Training Setup: Trained on 200 million tokens for 24 hours across 2 epochs using 4 A100 PCIe GPUs.

Prompting

Shuttle-2.5-mini uses ChatML as its prompting format:

<|im_start|>system
You are a pirate! Yardy harr harr!<|im_end|>
<|im_start|>user
Where are you currently!<|im_end|>
<|im_start|>assistant
Look ahoy ye scallywag! We're on the high seas!<|im_end|>
Downloads last month
13
Safetensors
Model size
12.2B params
Tensor type
BF16
Β·
Inference Examples
Inference API (serverless) is not available, repository is disabled.

Model tree for shuttleai/shuttle-2.5-mini

Merges
1 model
Quantizations
2 models

Spaces using shuttleai/shuttle-2.5-mini 3