πŸ€– gama-12b

gama-12b is a 12-billion parameter language model created through the strategic merge of multiple specialized models. This model combines the capabilities of different architectures to offer a more robust and versatile conversational experience.

πŸ“‹ Overview

This model was developed using the DARE TIES (Drop And REscale with Ties-Elimination) technique, an advanced model merging methodology that allows for the efficient combination of different specializations into a single cohesive model.

πŸ”§ Base Models Used

gama-12b is the result of merging the following models:

πŸ› οΈ Merge Tool

The merge was performed using LazyMergekit, a tool that facilitates the process of merging language models.

βš™οΈ Technical Configuration

Merge Parameters

models:
  - model: soob3123/amoral-gemma3-12B-v2-qat
    parameters:
      density: 0.6
      weight: 0.33

  - model: allura-org/Gemma-3-Glitter-12B
    parameters:
      density: 0.6
      weight: 0.33

  - model: soob3123/Veiled-Calla-12B
    parameters:
      density: 0.6
      weight: 0.34

merge_method: dare_ties
base_model: unsloth/gemma-3-12b-it-qat

parameters:
  normalize: true
  int8_mask: true

device: auto
dtype: float16

Technical Specifications

  • Architecture: Gemma-3 12B
  • Merge Method: DARE TIES
  • Precision: Float16
  • Quantization: QAT (Quantization Aware Training)
  • Normalization: Enabled
  • Int8 Mask: Enabled

πŸ’» How to Use

Installing Dependencies

pip install -qU transformers accelerate torch

Basic Usage Example

from transformers import AutoTokenizer, AutoModelForCausalLM
import transformers
import torch

# Model configuration
model_name = "rodrigomt/gama-12b"

# Load tokenizer and model
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.float16,
    device_map="auto",
    trust_remote_code=True
)

# Prepare the message
messages = [
    {"role": "user", "content": "What is a large language model?"}
]

# Apply chat template
prompt = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)

# Pipeline configuration
pipeline = transformers.pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    torch_dtype=torch.float16,
    device_map="auto",
)

# Text generation
outputs = pipeline(
    prompt,
    max_new_tokens=256,
    do_sample=True,
    temperature=0.7,
    top_k=50,
    top_p=0.95,
    repetition_penalty=1.1
)

print(outputs[0]["generated_text"])

Advanced Usage Example

# For more granular control
inputs = tokenizer.encode(prompt, return_tensors="pt")
attention_mask = inputs.ne(tokenizer.pad_token_id)

with torch.no_grad():
    outputs = model.generate(
        inputs,
        attention_mask=attention_mask,
        max_new_tokens=256,
        do_sample=True,
        temperature=0.7,
        top_k=50,
        top_p=0.95,
        repetition_penalty=1.1,
        pad_token_id=tokenizer.eos_token_id
    )

response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)

🎯 Key Features

  • Versatility: Combines capabilities from multiple specialized models
  • Efficiency: Optimized with QAT quantization for better performance
  • Compatibility: Fully compatible with the Transformers library
  • Scalability: Supports deployment on different hardware configurations

⚠️ System Requirements

Recommended Minimums

  • RAM: 32GB
  • VRAM: 24GB (GPU)
  • Storage: 50GB available

Ideal Configuration

  • RAM: 64GB+
  • VRAM: 40GB+ (GPU)
  • GPU: A6000, A100, or higher

πŸ“ License

This model is licensed under the Gemma License.

Downloads last month
24
Safetensors
Model size
12.2B params
Tensor type
BF16
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for rodrigomt/gama-12b