nicholasKluge/Aira-2-1B1-GGUF
Quantized GGUF model files for Aira-2-1B1 from nicholasKluge
Name | Quant method | Size |
---|---|---|
aira-2-1b1.fp16.gguf | fp16 | 2.20 GB |
aira-2-1b1.q2_k.gguf | q2_k | 482.15 MB |
aira-2-1b1.q3_k_m.gguf | q3_k_m | 549.86 MB |
aira-2-1b1.q4_k_m.gguf | q4_k_m | 667.83 MB |
aira-2-1b1.q5_k_m.gguf | q5_k_m | 782.06 MB |
aira-2-1b1.q6_k.gguf | q6_k | 903.43 MB |
aira-2-1b1.q8_0.gguf | q8_0 | 1.17 GB |
Original Model Card:
Aira-2-1B1
Aira-2
is the second version of the Aira instruction-tuned series. Aira-2-1B1
is an instruction-tuned GPT-style model based on TinyLlama-1.1B. The model was trained with a dataset composed of prompts and completions generated synthetically by prompting already-tuned models (ChatGPT, Llama, Open-Assistant, etc).
Check our gradio-demo in Spaces.
Details
- Size: 1,261,545,472 parameters
- Dataset: Instruct-Aira Dataset
- Language: English
- Number of Epochs: 3
- Batch size: 4
- Optimizer:
torch.optim.AdamW
(warmup_steps = 1e2, learning_rate = 5e-4, epsilon = 1e-8) - GPU: 1 NVIDIA A100-SXM4-40GB
- Emissions: 1.78 KgCO2 (Singapore)
- Total Energy Consumption: 3.64 kWh
This repository has the source code used to train this model.
Usage
Three special tokens are used to mark the user side of the interaction and the model's response:
<|startofinstruction|>
What is a language model?<|endofinstruction|>
A language model is a probability distribution over a vocabulary.<|endofcompletion|>
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
tokenizer = AutoTokenizer.from_pretrained('nicholasKluge/Aira-2-1B1')
aira = AutoModelForCausalLM.from_pretrained('nicholasKluge/Aira-2-1B1')
aira.eval()
aira.to(device)
question = input("Enter your question: ")
inputs = tokenizer(tokenizer.bos_token + question + tokenizer.sep_token, return_tensors="pt").to(device)
responses = aira.generate(**inputs,
bos_token_id=tokenizer.bos_token_id,
pad_token_id=tokenizer.pad_token_id,
eos_token_id=tokenizer.eos_token_id,
do_sample=True,
top_k=50,
max_length=500,
top_p=0.95,
temperature=0.7,
num_return_sequences=2)
print(f"Question: ๐ค {question}\n")
for i, response in enumerate(responses):
print(f'Response {i+1}: ๐ค {tokenizer.decode(response, skip_special_tokens=True).replace(question, "")}')
The model will output something like:
>>>Question: ๐ค What is the capital of Brazil?
>>>Response 1: ๐ค The capital of Brazil is Brasรญlia.
>>>Response 2: ๐ค The capital of Brazil is Brasรญlia.
Limitations
๐คฅ Generative models can perpetuate the generation of pseudo-informative content, that is, false information that may appear truthful.
๐คฌ In certain types of tasks, generative models can produce harmful and discriminatory content inspired by historical stereotypes.
Evaluation
Model (TinyLlama) | Average | ARC | TruthfulQA | ToxiGen |
---|---|---|---|---|
Aira-2-1B1 | 42.55 | 25.26 | 50.81 | 51.59 |
TinyLlama-1.1B-intermediate-step-480k-1T | 37.52 | 30.89 | 39.55 | 42.13 |
- Evaluations were performed using the Language Model Evaluation Harness (by EleutherAI).
Cite as ๐ค
@misc{nicholas22aira,
doi = {10.5281/zenodo.6989727},
url = {https://huggingface.co/nicholasKluge/Aira-2-1B1},
author = {Nicholas Kluge Corrรชa},
title = {Aira},
year = {2023},
publisher = {HuggingFace},
journal = {HuggingFace repository},
}
License
The Aira-2-1B1
is licensed under the Apache License, Version 2.0. See the LICENSE file for more details.
Open LLM Leaderboard Evaluation Results
Detailed results can be found here
Metric | Value |
---|---|
Avg. | 25.19 |
ARC (25-shot) | 23.21 |
HellaSwag (10-shot) | 26.97 |
MMLU (5-shot) | 24.86 |
TruthfulQA (0-shot) | 50.63 |
Winogrande (5-shot) | 50.28 |
GSM8K (5-shot) | 0.0 |
DROP (3-shot) | 0.39 |
- Downloads last month
- 28
Model tree for afrideva/Aira-2-1B1-GGUF
Base model
nicholasKluge/Aira-2-1B1