πŸ€– gama-4b

gama-4b is an efficient 4-billion parameter language model, specially optimized for multilingual conversation with a focus on Portuguese and English. This model combines specialized capabilities through a strategic merge of complementary models.

πŸ“‹ Overview

This model was developed using the DARE TIES (Drop And REscale with Ties-Elimination) technique, combining specialized models to create a compact and versatile solution for conversational applications in Portuguese and English.

🌟 Key Features

  • πŸ’¬ Bilingual: Optimized for Brazilian Portuguese and English
  • ⚑ Efficient: Only 4B parameters for fast deployment
  • πŸ”§ Quantized: QAT for better performance/size

πŸ”§ Base Models Used

gama-4b is the result of a strategic merge of the following models:

πŸ› οΈ Merge Tool

The merge was performed using LazyMergekit, facilitating the process of merging language models with advanced configurations.

βš™οΈ Technical Configuration

Merge Parameters

models:
  - model: CEIA-UFG/Gemma-3-Gaia-PT-BR-4b-it
    parameters:
      density: 0.6
      weight: 0.34

  - model: soob3123/Veiled-Calla-4B
    parameters:
      density: 0.6
      weight: 0.33

  - model: soob3123/amoral-gemma3-4B-v2-qat
    parameters:
      density: 0.6
      weight: 0.33

merge_method: dare_ties
base_model: unsloth/gemma-3-4b-it-qat

parameters:
  normalize: true
  int8_mask: true

dtype: bfloat16

Technical Specifications

  • Architecture: Gemma-3 4B
  • Merge Method: DARE TIES
  • Precision: BFloat16
  • Quantization: QAT (Quantization Aware Training)
  • Normalization: Enabled
  • Int8 Mask: Enabled
  • Languages: Portuguese (PT-BR) and English

πŸ’» How to Use

Installing Dependencies

pip install -qU transformers accelerate torch

Basic Usage Example

from transformers import AutoTokenizer, AutoModelForCausalLM
import transformers
import torch

# Model configuration
model_name = "rodrigomt/gama-4b"

# Load tokenizer and model
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True
)

# Example in Portuguese
messages_pt = [
    {"role": "user", "content": "What is a large language model?"}
]

# Example in English
messages_en = [
    {"role": "user", "content": "What is a large language model?"}
]

# Apply chat template
prompt = tokenizer.apply_chat_template(
    messages_pt,
    tokenize=False,
    add_generation_prompt=True
)

# Pipeline configuration
pipeline = transformers.pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)

# Text generation
outputs = pipeline(
    prompt,
    max_new_tokens=256,
    do_sample=True,
    temperature=0.7,
    top_k=50,
    top_p=0.95,
    repetition_penalty=1.1
)

print(outputs[0]["generated_text"])

Multilingual Usage Example

# Conversation switching languages
conversation = [
    {"role": "user", "content": "Hello! How are you?"},
    {"role": "assistant", "content": "Hello! I'm doing well, thank you for asking. How can I help you today?"},
    {"role": "user", "content": "Can you switch to English?"},
    {"role": "assistant", "content": "Of course! I can communicate in both Portuguese and English. How can I help you?"}
]

prompt = tokenizer.apply_chat_template(conversation, tokenize=False, add_generation_prompt=True)
outputs = pipeline(prompt, max_new_tokens=128, temperature=0.7)
print(outputs[0]["generated_text"])

Advanced Usage Example

# For more granular control over generation
def generate_response(prompt_text, max_tokens=256, temperature=0.7):
    inputs = tokenizer.encode(prompt_text, return_tensors="pt")
    attention_mask = inputs.ne(tokenizer.pad_token_id)

    with torch.no_grad():
        outputs = model.generate(
            inputs,
            attention_mask=attention_mask,
            max_new_tokens=max_tokens,
            do_sample=True,
            temperature=temperature,
            top_k=50,
            top_p=0.95,
            repetition_penalty=1.1,
            pad_token_id=tokenizer.eos_token_id,
            eos_token_id=tokenizer.eos_token_id
        )

    response = tokenizer.decode(outputs[0], skip_special_tokens=True)
    return response

# Using the function
response = generate_response("Explain machine learning in simple terms:")
print(response)

⚠️ System Requirements

Minimum Configuration

  • RAM: 16GB
  • VRAM: 8GB (GPU)
  • Storage: 20GB available
  • GPU: GTX 3070 or higher

Recommended Configuration

  • RAM: 32GB
  • VRAM: 16GB (GPU)
  • GPU: RTX 4070, A4000 or higher
  • CPU: Modern multi-core processor

πŸ”§ Advanced Settings

Temperature Adjustment

# More creative responses
outputs = pipeline(prompt, temperature=0.9, top_p=0.95)

# More conservative responses
outputs = pipeline(prompt, temperature=0.3, top_k=30)

Repetition Control

# Reduce repetitions
outputs = pipeline(prompt, repetition_penalty=1.2, no_repeat_ngram_size=3)

πŸ“ License

This model is licensed under the Gemma License.

Downloads last month
29
Safetensors
Model size
4.3B params
Tensor type
BF16
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for rodrigomt/gama-4b