π€ gama-4b
gama-4b is an efficient 4-billion parameter language model, specially optimized for multilingual conversation with a focus on Portuguese and English. This model combines specialized capabilities through a strategic merge of complementary models.
π Overview
This model was developed using the DARE TIES (Drop And REscale with Ties-Elimination) technique, combining specialized models to create a compact and versatile solution for conversational applications in Portuguese and English.
π Key Features
- π¬ Bilingual: Optimized for Brazilian Portuguese and English
- β‘ Efficient: Only 4B parameters for fast deployment
- π§ Quantized: QAT for better performance/size
π§ Base Models Used
gama-4b is the result of a strategic merge of the following models:
π οΈ Merge Tool
The merge was performed using LazyMergekit, facilitating the process of merging language models with advanced configurations.
βοΈ Technical Configuration
Merge Parameters
models:
- model: CEIA-UFG/Gemma-3-Gaia-PT-BR-4b-it
parameters:
density: 0.6
weight: 0.34
- model: soob3123/Veiled-Calla-4B
parameters:
density: 0.6
weight: 0.33
- model: soob3123/amoral-gemma3-4B-v2-qat
parameters:
density: 0.6
weight: 0.33
merge_method: dare_ties
base_model: unsloth/gemma-3-4b-it-qat
parameters:
normalize: true
int8_mask: true
dtype: bfloat16
Technical Specifications
- Architecture: Gemma-3 4B
- Merge Method: DARE TIES
- Precision: BFloat16
- Quantization: QAT (Quantization Aware Training)
- Normalization: Enabled
- Int8 Mask: Enabled
- Languages: Portuguese (PT-BR) and English
π» How to Use
Installing Dependencies
pip install -qU transformers accelerate torch
Basic Usage Example
from transformers import AutoTokenizer, AutoModelForCausalLM
import transformers
import torch
# Model configuration
model_name = "rodrigomt/gama-4b"
# Load tokenizer and model
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype=torch.bfloat16,
device_map="auto",
trust_remote_code=True
)
# Example in Portuguese
messages_pt = [
{"role": "user", "content": "What is a large language model?"}
]
# Example in English
messages_en = [
{"role": "user", "content": "What is a large language model?"}
]
# Apply chat template
prompt = tokenizer.apply_chat_template(
messages_pt,
tokenize=False,
add_generation_prompt=True
)
# Pipeline configuration
pipeline = transformers.pipeline(
"text-generation",
model=model,
tokenizer=tokenizer,
torch_dtype=torch.bfloat16,
device_map="auto",
)
# Text generation
outputs = pipeline(
prompt,
max_new_tokens=256,
do_sample=True,
temperature=0.7,
top_k=50,
top_p=0.95,
repetition_penalty=1.1
)
print(outputs[0]["generated_text"])
Multilingual Usage Example
# Conversation switching languages
conversation = [
{"role": "user", "content": "Hello! How are you?"},
{"role": "assistant", "content": "Hello! I'm doing well, thank you for asking. How can I help you today?"},
{"role": "user", "content": "Can you switch to English?"},
{"role": "assistant", "content": "Of course! I can communicate in both Portuguese and English. How can I help you?"}
]
prompt = tokenizer.apply_chat_template(conversation, tokenize=False, add_generation_prompt=True)
outputs = pipeline(prompt, max_new_tokens=128, temperature=0.7)
print(outputs[0]["generated_text"])
Advanced Usage Example
# For more granular control over generation
def generate_response(prompt_text, max_tokens=256, temperature=0.7):
inputs = tokenizer.encode(prompt_text, return_tensors="pt")
attention_mask = inputs.ne(tokenizer.pad_token_id)
with torch.no_grad():
outputs = model.generate(
inputs,
attention_mask=attention_mask,
max_new_tokens=max_tokens,
do_sample=True,
temperature=temperature,
top_k=50,
top_p=0.95,
repetition_penalty=1.1,
pad_token_id=tokenizer.eos_token_id,
eos_token_id=tokenizer.eos_token_id
)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
return response
# Using the function
response = generate_response("Explain machine learning in simple terms:")
print(response)
β οΈ System Requirements
Minimum Configuration
- RAM: 16GB
- VRAM: 8GB (GPU)
- Storage: 20GB available
- GPU: GTX 3070 or higher
Recommended Configuration
- RAM: 32GB
- VRAM: 16GB (GPU)
- GPU: RTX 4070, A4000 or higher
- CPU: Modern multi-core processor
π§ Advanced Settings
Temperature Adjustment
# More creative responses
outputs = pipeline(prompt, temperature=0.9, top_p=0.95)
# More conservative responses
outputs = pipeline(prompt, temperature=0.3, top_k=30)
Repetition Control
# Reduce repetitions
outputs = pipeline(prompt, repetition_penalty=1.2, no_repeat_ngram_size=3)
π License
This model is licensed under the Gemma License.
- Downloads last month
- 29
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
π
Ask for provider support
Model tree for rodrigomt/gama-4b
Merge model
this model