|
--- |
|
license: mit |
|
base_model: jpacifico/Chocolatine-3B-Instruct-DPO-Revised |
|
pipeline_tag: text-generation |
|
inference: false |
|
model_creator: jpacifico |
|
model_name: Chocolatine-3B-Instruct-DPO-Revised |
|
model_type: phi3 |
|
language: |
|
- fr |
|
- en |
|
datasets: |
|
- jpacifico/french-orca-dpo-pairs-revised |
|
library_name: transformers |
|
quantized_by: ThiloteE |
|
tags: |
|
- text-generation-inference |
|
- transformers |
|
- GGUF |
|
- GPT4All-community |
|
- GPT4All |
|
- conversational |
|
- french |
|
- chocolatine |
|
|
|
|
|
--- |
|
|
|
> [!NOTE] |
|
>This is a model that is assumed to perform well, but may require more testing and user feedback. Be aware, only models featured within the GUI of GPT4All, are curated and officially supported by Nomic. Use at your own risk. |
|
|
|
|
|
# About |
|
|
|
<!-- ### quantize_version: 3 --> |
|
<!-- ### convert_type: hf --> |
|
|
|
|
|
- Static quants of https://huggingface.co/jpacifico/Chocolatine-3B-Instruct-DPO-Revised at commit [fa3e742](https://huggingface.co/jpacifico/Chocolatine-3B-Instruct-DPO-Revised/commit/fa3e742dd80b3f38127fb62f5fc66eaf468fb95c) |
|
- Quantized by [ThiloteE](https://huggingface.co/ThiloteE) with llama.cpp commit [e09a800](https://github.com/ggerganov/llama.cpp/commit/e09a800f9a9b19c73aa78e03b4c4be8ed988f3e6) |
|
|
|
These quants were created with a customized configuration that have been proven to not cause visible end of string (eos) tokens during inference with [GPT4All](https://www.nomic.ai/gpt4all). |
|
The config.json, generation_config.json and tokenizer_config.json differ from the original configuration as can be found in the original model's repository at the time of creation of these quants. |
|
|
|
|
|
# Prompt Template (for GPT4All) |
|
|
|
Example System Prompt: |
|
``` |
|
<|system|> |
|
Vous trouverez ci-dessous une instruction décrivant une tâche. Rédigez une réponse qui réponde de manière appropriée à la demande.<|end|> |
|
|
|
``` |
|
|
|
Chat Template: |
|
``` |
|
<|user|> |
|
%1<|end|> |
|
<|assistant|> |
|
%2<|end|> |
|
|
|
``` |
|
|
|
# Context Length |
|
|
|
`4096` |
|
|
|
Use a lower value during inference, if you do not have enough RAM or VRAM. |
|
|
|
# Provided Quants |
|
|
|
|
|
| Link | Type | Size/GB | Notes | |
|
|:-----|:-----|--------:|:------| |
|
| [GGUF](https://huggingface.co/GPT4All-Community/Chocolatine-3B-Instruct-DPO-Revised-GGUF/resolve/main/Chocolatine-3B-Instruct-DPO-Revised-Q4_0.gguf?download=true) | Q4_0 | 2.44 | fast, recommended | |
|
|
|
|
|
|
|
|
|
# About GGUF |
|
|
|
If you are unsure how to use GGUF files, refer to one of [TheBloke's |
|
READMEs](https://huggingface.co/TheBloke/DiscoLM_German_7b_v1-GGUF) for |
|
more details, including on how to concatenate multi-part files. |
|
|
|
Here is a handy graph by ikawrakow comparing some quant types (lower is better): |
|
|
|
 |
|
|
|
And here are Artefact2's thoughts on the matter: |
|
https://gist.github.com/Artefact2/b5f810600771265fc1e39442288e8ec9 |
|
|
|
# Thanks |
|
|
|
I thank Mradermacher and TheBloke for Inspiration to this model card and their contributions to open source. Also 3Simplex for lots of help along the way. |
|
Shoutout to the GPT4All and llama.cpp communities :-) |
|
|
|
|
|
------ |
|
|
|
<!-- footer end --> |
|
<!-- original-model-card start --> |
|
|
|
|
|
------ |
|
------ |
|
|
|
# Original Model card: |
|
|
|
<!--- |
|
library_name: transformers |
|
license: mit |
|
language: |
|
- fr |
|
- en |
|
tags: |
|
- french |
|
- chocolatine |
|
datasets: |
|
- jpacifico/french-orca-dpo-pairs-revised |
|
pipeline_tag: text-generation |
|
---> |
|
|
|
### Chocolatine-3B-Instruct-DPO-Revised |
|
|
|
DPO fine-tuned of [microsoft/Phi-3-mini-4k-instruct](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct) (3.82B params) |
|
using the [jpacifico/french-orca-dpo-pairs-revised](https://huggingface.co/datasets/jpacifico/french-orca-dpo-pairs-revised) rlhf dataset. |
|
Training in French also improves the model in English, surpassing the performances of its base model. |
|
Window context = 4k tokens |
|
|
|
### Benchmarks |
|
|
|
Chocolatine is the best-performing 3B model on the [OpenLLM Leaderboard](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard) (august 2024) |
|
|
|
 |
|
|
|
|
|
| Metric |Value| |
|
|-------------------|----:| |
|
|**Avg.** |**27.63**| |
|
|IFEval (0-Shot) |56.23| |
|
|BBH (3-Shot) |37.16| |
|
|MATH Lvl 5 (4-Shot)|14.5| |
|
|GPQA (0-shot) |9.62| |
|
|MuSR (0-shot) |15.1| |
|
|MMLU-PRO (5-shot) |33.21| |
|
|
|
|
|
### MT-Bench-French |
|
|
|
Chocolatine-3B-Instruct-DPO-Revised is outperforming GPT-3.5-Turbo on [MT-Bench-French](https://huggingface.co/datasets/bofenghuang/mt-bench-french) by Bofeng Huang, |
|
used with [multilingual-mt-bench](https://github.com/Peter-Devine/multilingual_mt_bench) |
|
|
|
``` |
|
########## First turn ########## |
|
score |
|
model turn |
|
gpt-3.5-turbo 1 8.1375 |
|
Chocolatine-3B-Instruct-DPO-Revised 1 7.9875 |
|
Daredevil-8B 1 7.8875 |
|
Daredevil-8B-abliterated 1 7.8375 |
|
Chocolatine-3B-Instruct-DPO-v1.0 1 7.6875 |
|
NeuralDaredevil-8B-abliterated 1 7.6250 |
|
Phi-3-mini-4k-instruct 1 7.2125 |
|
Meta-Llama-3-8B-Instruct 1 7.1625 |
|
vigostral-7b-chat 1 6.7875 |
|
Mistral-7B-Instruct-v0.3 1 6.7500 |
|
Mistral-7B-Instruct-v0.2 1 6.2875 |
|
French-Alpaca-7B-Instruct_beta 1 5.6875 |
|
vigogne-2-7b-chat 1 5.6625 |
|
vigogne-2-7b-instruct 1 5.1375 |
|
|
|
########## Second turn ########## |
|
score |
|
model turn |
|
Chocolatine-3B-Instruct-DPO-Revised 2 7.937500 |
|
gpt-3.5-turbo 2 7.679167 |
|
Chocolatine-3B-Instruct-DPO-v1.0 2 7.612500 |
|
NeuralDaredevil-8B-abliterated 2 7.125000 |
|
Daredevil-8B 2 7.087500 |
|
Daredevil-8B-abliterated 2 6.873418 |
|
Meta-Llama-3-8B-Instruct 2 6.800000 |
|
Mistral-7B-Instruct-v0.2 2 6.512500 |
|
Mistral-7B-Instruct-v0.3 2 6.500000 |
|
Phi-3-mini-4k-instruct 2 6.487500 |
|
vigostral-7b-chat 2 6.162500 |
|
French-Alpaca-7B-Instruct_beta 2 5.487395 |
|
vigogne-2-7b-chat 2 2.775000 |
|
vigogne-2-7b-instruct 2 2.240506 |
|
|
|
########## Average ########## |
|
score |
|
model |
|
Chocolatine-3B-Instruct-DPO-Revised 7.962500 |
|
gpt-3.5-turbo 7.908333 |
|
Chocolatine-3B-Instruct-DPO-v1.0 7.650000 |
|
Daredevil-8B 7.487500 |
|
NeuralDaredevil-8B-abliterated 7.375000 |
|
Daredevil-8B-abliterated 7.358491 |
|
Meta-Llama-3-8B-Instruct 6.981250 |
|
Phi-3-mini-4k-instruct 6.850000 |
|
Mistral-7B-Instruct-v0.3 6.625000 |
|
vigostral-7b-chat 6.475000 |
|
Mistral-7B-Instruct-v0.2 6.400000 |
|
French-Alpaca-7B-Instruct_beta 5.587866 |
|
vigogne-2-7b-chat 4.218750 |
|
vigogne-2-7b-instruct 3.698113 |
|
``` |
|
|
|
### Usage |
|
|
|
You can run this model using my [Colab notebook](https://github.com/jpacifico/Chocolatine-LLM/blob/main/Chocolatine_3B_inference_test_colab.ipynb) |
|
|
|
You can also run Chocolatine using the following code: |
|
|
|
```python |
|
import transformers |
|
from transformers import AutoTokenizer |
|
|
|
# Format prompt |
|
message = [ |
|
{"role": "system", "content": "You are a helpful assistant chatbot."}, |
|
{"role": "user", "content": "What is a Large Language Model?"} |
|
] |
|
tokenizer = AutoTokenizer.from_pretrained(new_model) |
|
prompt = tokenizer.apply_chat_template(message, add_generation_prompt=True, tokenize=False) |
|
|
|
# Create pipeline |
|
pipeline = transformers.pipeline( |
|
"text-generation", |
|
model=new_model, |
|
tokenizer=tokenizer |
|
) |
|
|
|
# Generate text |
|
sequences = pipeline( |
|
prompt, |
|
do_sample=True, |
|
temperature=0.7, |
|
top_p=0.9, |
|
num_return_sequences=1, |
|
max_length=200, |
|
) |
|
print(sequences[0]['generated_text']) |
|
``` |
|
|
|
* **4-bit quantized version** is available here : [jpacifico/Chocolatine-3B-Instruct-DPO-Revised-Q4_K_M-GGUF](https://huggingface.co/jpacifico/Chocolatine-3B-Instruct-DPO-Revised-Q4_K_M-GGUF) |
|
|
|
* **Ollama**: [jpacifico/chocolatine-3b](https://ollama.com/jpacifico/chocolatine-3b) |
|
|
|
```bash |
|
ollama run jpacifico/chocolatine-3b |
|
``` |
|
|
|
Ollama *Modelfile* example : |
|
|
|
```bash |
|
FROM ./chocolatine-3b-instruct-dpo-revised-q4_k_m.gguf |
|
TEMPLATE """{{ if .System }}<|system|> |
|
{{ .System }}<|end|> |
|
{{ end }}{{ if .Prompt }}<|user|> |
|
{{ .Prompt }}<|end|> |
|
{{ end }}<|assistant|> |
|
{{ .Response }}<|end|> |
|
""" |
|
PARAMETER stop """{"stop": ["<|end|>","<|user|>","<|assistant|>"]}""" |
|
SYSTEM """You are a friendly assistant called Chocolatine.""" |
|
``` |
|
|
|
### Limitations |
|
|
|
The Chocolatine model is a quick demonstration that a base model can be easily fine-tuned to achieve compelling performance. |
|
It does not have any moderation mechanism. |
|
|
|
- **Developed by:** Jonathan Pacifico, 2024 |
|
- **Model type:** LLM |
|
- **Language(s) (NLP):** French, English |
|
- **License:** MIT |
|
|
|
|
|
<!-- original-model-card end --> |
|
<!-- end --> |
|
|