rodrigomt
/

gama-4b

@@ -25,90 +25,90 @@ pipeline_tag: text-generation
 # 🤖 gama-4b
-**gama-4b** é um modelo de linguagem eficiente de 4 bilhões de parâmetros, especialmente otimizado para conversação **multilíngue** com foco em **português e inglês**. Este modelo combina capacidades especializadas através de uma fusão estratégica de modelos complementares.
-## 📋 Visão Geral
-Este modelo foi desenvolvido utilizando a técnica **DARE TIES** (Drop And REscale with Ties-Elimination), combinando modelos especializados para criar uma solução compacta e versátil para aplicações conversacionais em português e inglês.
-### 🌟 Características Principais
-- **💬 Bilíngue:** Otimizado para português brasileiro e inglês
-- **⚡ Eficiente:** Apenas 4B parâmetros para deployment rápido
-- **🔧 Quantizado:** QAT para melhor performance/tamanho
-### 🔧 Modelos Base Utilizados
-O **gama-4b** é resultado da fusão estratégica dos seguintes modelos:
-- **[CEIA-UFG/Gemma-3-Gaia-PT-BR-4b-it](https://huggingface.co/CEIA-UFG/Gemma-3-Gaia-PT-BR-4b-it)**
-- **[soob3123/Veiled-Calla-4B](https://huggingface.co/soob3123/Veiled-Calla-4B)**
-- **[soob3123/amoral-gemma3-4B-v2-qat](https://huggingface.co/soob3123/amoral-gemma3-4B-v2-qat)**
-### 🛠️ Ferramenta de Fusão
-A fusão foi realizada utilizando **[LazyMergekit](https://colab.research.google.com/drive/1obulZ1ROXHjYLn6PPZJwRR6GzgQogxxb?usp=sharing)**, facilitando o processo de merge de modelos de linguagem com configurações avançadas.
-## ⚙️ Configuração Técnica
-### Parâmetros de Fusão
 ```yaml
-models:
   - model: CEIA-UFG/Gemma-3-Gaia-PT-BR-4b-it
     parameters:
-      density: 0.6
       weight: 0.34
   - model: soob3123/Veiled-Calla-4B
     parameters:
-      density: 0.6
       weight: 0.33
   - model: soob3123/amoral-gemma3-4B-v2-qat
     parameters:
       density: 0.6
-      weight: 0.33
 merge_method: dare_ties
 base_model: unsloth/gemma-3-4b-it-qat
 parameters:
-  normalize: true
-  int8_mask: true
 dtype: bfloat16
 ```
-### Especificações Técnicas
-- **Arquitetura:** Gemma-3 4B
-- **Método de Fusão:** DARE TIES
-- **Precisão:** BFloat16
-- **Quantização:** QAT (Quantization Aware Training)
-- **Normalização:** Ativada
-- **Máscara Int8:** Ativada
-- **Idiomas:** Português (PT-BR) e Inglês
-## 💻 Como Usar
-### Instalação das Dependências
 ```bash
 pip install -qU transformers accelerate torch
 ```
-### Exemplo de Uso Básico
 ```python
 from transformers import AutoTokenizer, AutoModelForCausalLM
 import transformers
 import torch
-# Configuração do modelo
 model_name = "rodrigomt/gama-4b"
-# Carregamento do tokenizer e modelo
 tokenizer = AutoTokenizer.from_pretrained(model_name)
 model = AutoModelForCausalLM.from_pretrained(
     model_name,
@@ -117,24 +117,24 @@ model = AutoModelForCausalLM.from_pretrained(
     trust_remote_code=True
 )
-# Exemplo em português
 messages_pt = [
-    {"role": "user", "content": "O que é um modelo de linguagem grande?"}
 ]
-# Exemplo em inglês
 messages_en = [
     {"role": "user", "content": "What is a large language model?"}
 ]
-# Aplicação do template de chat
 prompt = tokenizer.apply_chat_template(
-    messages_pt,
-    tokenize=False,
     add_generation_prompt=True
 )
-# Configuração do pipeline
 pipeline = transformers.pipeline(
     "text-generation",
     model=model,
@@ -143,7 +143,7 @@ pipeline = transformers.pipeline(
     device_map="auto",
 )
-# Geração de texto
 outputs = pipeline(
     prompt,
     max_new_tokens=256,
@@ -157,13 +157,13 @@ outputs = pipeline(
 print(outputs[0]["generated_text"])
 ```
-### Exemplo de Uso Multilíngue
 ```python
-# Conversação alternando idiomas
 conversation = [
-    {"role": "user", "content": "Olá! Como você está?"},
-    {"role": "assistant", "content": "Olá! Estou bem, obrigado por perguntar. Como posso ajudá-lo hoje?"},
     {"role": "user", "content": "Can you switch to English?"},
     {"role": "assistant", "content": "Of course! I can communicate in both Portuguese and English. How can I help you?"}
 ]
@@ -173,14 +173,14 @@ outputs = pipeline(prompt, max_new_tokens=128, temperature=0.7)
 print(outputs[0]["generated_text"])
 ```
-### Exemplo de Uso Avançado
 ```python
-# Para controle mais granular da geração
 def generate_response(prompt_text, max_tokens=256, temperature=0.7):
     inputs = tokenizer.encode(prompt_text, return_tensors="pt")
     attention_mask = inputs.ne(tokenizer.pad_token_id)
     with torch.no_grad():
         outputs = model.generate(
             inputs,
@@ -194,51 +194,56 @@ def generate_response(prompt_text, max_tokens=256, temperature=0.7):
             pad_token_id=tokenizer.eos_token_id,
             eos_token_id=tokenizer.eos_token_id
         )
     response = tokenizer.decode(outputs[0], skip_special_tokens=True)
     return response
-# Uso da função
-response = generate_response("Explique machine learning em termos simples:")
 print(response)
 ```
-## ⚠️ Requisitos de Sistema
-### Configuração Mínima
-- **RAM:** 16GB
-- **VRAM:** 8GB (GPU)
-- **Armazenamento:** 20GB disponíveis
-- **GPU:** GTX 3070 ou superior
-### Configuração Recomendada
-- **RAM:** 32GB
-- **VRAM:** 16GB (GPU)
-- **GPU:** RTX 4070, A4000 ou superior
-- **CPU:** Processador moderno multi-core
-### Deployment em Produção
-- **RAM:** 32GB+
-- **VRAM:** 24GB+ (GPU)
-- **GPU:** A6000, A100 ou superior para alta concorrência
-## 🔧 Configurações Avançadas
-### Ajuste de Temperatura
 ```python
-# Respostas mais criativas
 outputs = pipeline(prompt, temperature=0.9, top_p=0.95)
-# Respostas mais conservadoras
 outputs = pipeline(prompt, temperature=0.3, top_k=30)
 ```
-### Controle de Repetição
 ```python
-# Reduzir repetições
 outputs = pipeline(prompt, repetition_penalty=1.2, no_repeat_ngram_size=3)
 ```
-## 📝 Licença
-Este modelo está licenciado sob a **Licença Gemma**.

 # 🤖 gama-4b
+**gama-4b** is an efficient 4-billion parameter language model, specially optimized for **multilingual** conversation with a focus on **Portuguese and English**. This model combines specialized capabilities through a strategic merge of complementary models.
+## 📋 Overview
+This model was developed using the **DARE TIES** (Drop And REscale with Ties-Elimination) technique, combining specialized models to create a compact and versatile solution for conversational applications in Portuguese and English.
+### 🌟 Key Features
+  - **💬 Bilingual:** Optimized for Brazilian Portuguese and English
+  - **⚡ Efficient:** Only 4B parameters for fast deployment
+  - **🔧 Quantized:** QAT for better performance/size
+### 🔧 Base Models Used
+**gama-4b** is the result of a strategic merge of the following models:
+  - **[CEIA-UFG/Gemma-3-Gaia-PT-BR-4b-it](https://huggingface.co/CEIA-UFG/Gemma-3-Gaia-PT-BR-4b-it)**
+  - **[soob3123/Veiled-Calla-4B](https://huggingface.co/soob3123/Veiled-Calla-4B)**
+  - **[soob3123/amoral-gemma3-4B-v2-qat](https://huggingface.co/soob3123/amoral-gemma3-4B-v2-qat)**
+### 🛠️ Merge Tool
+The merge was performed using **[LazyMergekit](https://colab.research.google.com/drive/1obulZ1ROXHjYLn6PPZJwRR6GzgQogxxb?usp=sharing)**, facilitating the process of merging language models with advanced configurations.
+## ⚙️ Technical Configuration
+### Merge Parameters
 ```yaml
+models:
   - model: CEIA-UFG/Gemma-3-Gaia-PT-BR-4b-it
     parameters:
+      density: 0.6
       weight: 0.34
   - model: soob3123/Veiled-Calla-4B
     parameters:
+      density: 0.6
       weight: 0.33
   - model: soob3123/amoral-gemma3-4B-v2-qat
     parameters:
       density: 0.6
+      weight: 0.33
 merge_method: dare_ties
 base_model: unsloth/gemma-3-4b-it-qat
 parameters:
+  normalize: true
+  int8_mask: true
 dtype: bfloat16
 ```
+### Technical Specifications
+  - **Architecture:** Gemma-3 4B
+  - **Merge Method:** DARE TIES
+  - **Precision:** BFloat16
+  - **Quantization:** QAT (Quantization Aware Training)
+  - **Normalization:** Enabled
+  - **Int8 Mask:** Enabled
+  - **Languages:** Portuguese (PT-BR) and English
+## 💻 How to Use
+### Installing Dependencies
 ```bash
 pip install -qU transformers accelerate torch
 ```
+### Basic Usage Example
 ```python
 from transformers import AutoTokenizer, AutoModelForCausalLM
 import transformers
 import torch
+# Model configuration
 model_name = "rodrigomt/gama-4b"
+# Load tokenizer and model
 tokenizer = AutoTokenizer.from_pretrained(model_name)
 model = AutoModelForCausalLM.from_pretrained(
     model_name,
     trust_remote_code=True
 )
+# Example in Portuguese
 messages_pt = [
+    {"role": "user", "content": "What is a large language model?"}
 ]
+# Example in English
 messages_en = [
     {"role": "user", "content": "What is a large language model?"}
 ]
+# Apply chat template
 prompt = tokenizer.apply_chat_template(
+    messages_pt,
+    tokenize=False,
     add_generation_prompt=True
 )
+# Pipeline configuration
 pipeline = transformers.pipeline(
     "text-generation",
     model=model,
     device_map="auto",
 )
+# Text generation
 outputs = pipeline(
     prompt,
     max_new_tokens=256,
 print(outputs[0]["generated_text"])
 ```
+### Multilingual Usage Example
 ```python
+# Conversation switching languages
 conversation = [
+    {"role": "user", "content": "Hello! How are you?"},
+    {"role": "assistant", "content": "Hello! I'm doing well, thank you for asking. How can I help you today?"},
     {"role": "user", "content": "Can you switch to English?"},
     {"role": "assistant", "content": "Of course! I can communicate in both Portuguese and English. How can I help you?"}
 ]
 print(outputs[0]["generated_text"])
 ```
+### Advanced Usage Example
 ```python
+# For more granular control over generation
 def generate_response(prompt_text, max_tokens=256, temperature=0.7):
     inputs = tokenizer.encode(prompt_text, return_tensors="pt")
     attention_mask = inputs.ne(tokenizer.pad_token_id)
     with torch.no_grad():
         outputs = model.generate(
             inputs,
             pad_token_id=tokenizer.eos_token_id,
             eos_token_id=tokenizer.eos_token_id
         )
     response = tokenizer.decode(outputs[0], skip_special_tokens=True)
     return response
+# Using the function
+response = generate_response("Explain machine learning in simple terms:")
 print(response)
 ```
+## ⚠️ System Requirements
+### Minimum Configuration
+  - **RAM:** 16GB
+  - **VRAM:** 8GB (GPU)
+  - **Storage:** 20GB available
+  - **GPU:** GTX 3070 or higher
+### Recommended Configuration
+  - **RAM:** 32GB
+  - **VRAM:** 16GB (GPU)
+  - **GPU:** RTX 4070, A4000 or higher
+  - **CPU:** Modern multi-core processor
+### Production Deployment
+  - **RAM:** 32GB+
+  - **VRAM:** 24GB+ (GPU)
+  - **GPU:** A6000, A100 or higher for high concurrency
+## 🔧 Advanced Settings
+### Temperature Adjustment
 ```python
+# More creative responses
 outputs = pipeline(prompt, temperature=0.9, top_p=0.95)
+# More conservative responses
 outputs = pipeline(prompt, temperature=0.3, top_k=30)
 ```
+### Repetition Control
 ```python
+# Reduce repetitions
 outputs = pipeline(prompt, repetition_penalty=1.2, no_repeat_ngram_size=3)
 ```
+## 📝 License
+This model is licensed under the **Gemma License**.