IIC
/

Text Generation
Transformers
GGUF
Spanish
chat
conversational
File size: 4,406 Bytes
6986e18
 
 
 
 
 
 
08e99f3
4dce5c2
6986e18
 
 
 
 
 
 
530f057
 
 
 
6986e18
 
c0bdc36
702726a
0e19e06
 
530f057
 
 
 
 
 
 
 
0e19e06
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
98371b3
0e19e06
 
ac27130
0e19e06
ac27130
 
 
98371b3
ac27130
 
 
 
 
98371b3
ac27130
 
 
 
 
98371b3
ac27130
 
 
 
 
 
 
 
 
 
 
 
 
 
d72dd06
ac27130
 
 
 
 
d72dd06
ac27130
6986e18
 
 
974e78f
d0bb449
 
 
6986e18
 
 
606f832
6986e18
606f832
 
 
 
6986e18
 
 
 
0fbad3a
 
 
 
 
 
 
 
 
 
 
 
6986e18
 
651d7b4
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
---
library_name: transformers
language:
- es
base_model:
- IIC/RigoChat-7b-v2
pipeline_tag: text-generation
license: other
license_name: rigochat-nc
license_link: https://huggingface.co/IIC/RigoChat-7b-v2-GGUF/blob/main/LICENSE
tags:
- chat
---

# Model Card for RigoChat-7b-v2-GGUF

<div style="display: flex; align-items: flex-start;">

<div style="flex: 1;">
  
## Introduction

This repo contains [IIC/RigoChat-7b-v2](https://huggingface.co/IIC/RigoChat-7b-v2) model in the GGUF Format, with the original weights and quantized to different precisions.

The [llama.cpp](https://github.com/ggerganov/llama.cpp) library has been used to transform the parameters into GGUF format, as well as to perform the quantizations. Specifically, the following command has been used to obtain the model in full precision:

</div>

<div style="margin-left: 20px;">
<img src="./images/RigoChat.jpg">
</div>

</div>

1. To download the weights:

```python
from huggingface_hub import snapshot_download
import os

model_id="IIC/RigoChat-7b-v2"

os.environ["MODEL_DIR"] = snapshot_download(
    repo_id=model_id,
    local_dir="model",
    local_dir_use_symlinks=False,
    revision="main",
)
```

2. To transform to `FP16`:

```shell
python ./llama.cpp/convert_hf_to_gguf.py $MODEL_DIR --outfile rigochat-7b-v2-F16.gguf --outtype f16
```

Nevertheless, you can download this weights [here](https://huggingface.co/IIC/RigoChat-7b-v2-GGUF/blob/main/rigochat-7b-v2-F16.gguf).

To quantize `rigochat-7b-v2-F16.gguf` into diferent sizes, first, we calculates an importance matrix as follows:

```shell
./llama.cpp/llama-imatrix -m ./rigochat-7b-v2-fp16.gguf -f train_data.txt -c 1024
```

where `train_data.txt` is an spanish raw-text dataset for calibration. This generates an `imatrix.dat` file that we can use to quantize the original model. For example, to get the `Q4_K_M` precision with this config, do:

```shell
./llama.cpp/llama-quantize --imatrix imatrix.dat ./rigochat-7b-v2-fp16.gguf ./quantize_models/rigochat-7b-v2-Q4_K_M.gguf Q4_K_M
```

and so on. Yo can do:

```shell
./llama.cpp/llama-quantize --help
```

to see all the quantization options. To check how imatrix works, [this example](https://github.com/ggerganov/llama.cpp/blob/master/examples/imatrix/README.md) can be usefull. For more information on the quantization types, see [this link](https://huggingface.co/docs/hub/gguf#quantization-types).

#### Disclaimer

The `train_data.txt` dataset is optional for most quantizations. We have used an experimental dataset to obtain all possible quantizations. However, we highly recommend downloading the weights in full precision: `rigochat-7b-v2-fp16.gguf` and trying to quantize the model with your own datasets, adapted to the use case you want to use.


## How to Get Started with the Model

You can do, for example

```shell
./llama.cpp/llama-cli -m ./rigochat-7b-v2-Q8_0.gguf -co -cnv -p "your system." -fa -ngl -1 -n 512
```

or

```shell
./llama.cpp/llama-server -m ./rigochat-7b-v2-Q8_0.gguf -co -cnv -p "your system." -fa -ngl -1 -n 512
```

## Evaluation

The evaluations are discussed in greater detail in the paper and the [official repository](https://huggingface.co/IIC/RigoChat-7b-v2). Here, we present only the graph illustrating how the model's performance improves as precision increases.

![](./images/quantization_results.png)

## Citation

```
@misc {instituto_de_ingeniería_del_conocimiento_2025,
	author       = { {Instituto de Ingeniería del Conocimiento} },
	title        = { RigoChat-7b-v2-GGUF },
	year         = 2025,
	url          = { https://huggingface.co/IIC/RigoChat-7b-v2-GGUF },
	doi          = { 10.57967/hf/4159 },
	publisher    = { Hugging Face }
}
```

```
@misc{gómez2025rigochat2adaptedlanguage,
      title={RigoChat 2: an adapted language model to Spanish using a bounded dataset and reduced hardware}, 
      author={Gonzalo Santamaría Gómez and Guillem García Subies and Pablo Gutiérrez Ruiz and Mario González Valero and Natàlia Fuertes and Helena Montoro Zamorano and Carmen Muñoz Sanz and Leire Rosado Plaza and Nuria Aldama García and David Betancur Sánchez and Kateryna Sushkova and Marta Guerrero Nieto and Álvaro Barbero Jiménez},
      year={2025},
      eprint={2503.08188},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2503.08188}, 
}
```

## Model Card Contact

- `[email protected]`