File size: 4,888 Bytes
75b5de3 5b61c9a 75b5de3 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 |
---
license: llama3.2
language:
- en
base_model: prithivMLmods/Bellatrix-Tiny-3B-R1
library_name: transformers
tags:
- trl
- llama3.2
- Reinforcement learning
- llama-cpp
- gguf-my-repo
---
# Triangle104/Bellatrix-Tiny-3B-R1-Q6_K-GGUF
This model was converted to GGUF format from [`prithivMLmods/Bellatrix-Tiny-3B-R1`](https://huggingface.co/prithivMLmods/Bellatrix-Tiny-3B-R1) using llama.cpp via the ggml.ai's [GGUF-my-repo](https://huggingface.co/spaces/ggml-org/gguf-my-repo) space.
Refer to the [original model card](https://huggingface.co/prithivMLmods/Bellatrix-Tiny-3B-R1) for more details on the model.
---
Bellatrix is based on a reasoning-based model designed for the DeepSeek-R1
synthetic dataset entries. The pipeline's instruction-tuned, text-only
models are optimized for multilingual dialogue use cases, including
agentic retrieval and summarization tasks. These models outperform many
of the available open-source options. Bellatrix is an auto-regressive
language model that uses an optimized transformer architecture. The
tuned versions utilize supervised fine-tuning (SFT) and reinforcement
learning with human feedback (RLHF).
Use with transformers
Starting with transformers >= 4.43.0 onward, you can run conversational inference using the Transformers pipeline abstraction or by leveraging the Auto classes with the generate() function.
Make sure to update your transformers installation via:
pip install --upgrade transformers
import torch
from transformers import pipeline
model_id = "prithivMLmods/Bellatrix-Tiny-3B-R1"
pipe = pipeline(
"text-generation",
model=model_id,
torch_dtype=torch.bfloat16,
device_map="auto",
)
messages = [
{"role": "system", "content": "You are a pirate chatbot who always responds in pirate speak!"},
{"role": "user", "content": "Who are you?"},
]
outputs = pipe(
messages,
max_new_tokens=256,
)
print(outputs[0]["generated_text"][-1])
Note: You can also find detailed recipes on how to use the model locally, with torch.compile(), assisted generations, quantization, and more at huggingface-llama-recipes.
Intended Use
Bellatrix is designed for applications that require advanced
reasoning and multilingual dialogue capabilities. It is particularly
suitable for:
Agentic Retrieval: Enabling intelligent retrieval of relevant information in a dialogue or query-response system.
Summarization Tasks: Condensing large bodies of text into concise summaries for easier comprehension.
Multilingual Use Cases: Supporting conversations in multiple languages with high accuracy and coherence.
Instruction-Based Applications: Following complex, context-aware instructions to generate precise outputs in a variety of scenarios.
Limitations
Despite its capabilities, Bellatrix has some limitations:
Domain Specificity: While it performs well on general tasks, its performance may degrade with highly specialized or niche datasets.
Dependence on Training Data: It is only as good as the quality and diversity of its training data, which may lead to biases or inaccuracies.
Computational Resources: The model’s optimized
transformer architecture can be resource-intensive, requiring
significant computational power for fine-tuning and inference.
Language Coverage: While multilingual, some languages or dialects may have limited support or lower performance compared to widely used ones.
Real-World Contexts: It may struggle with understanding nuanced or ambiguous real-world scenarios not covered during training.
---
## Use with llama.cpp
Install llama.cpp through brew (works on Mac and Linux)
```bash
brew install llama.cpp
```
Invoke the llama.cpp server or the CLI.
### CLI:
```bash
llama-cli --hf-repo Triangle104/Bellatrix-Tiny-3B-R1-Q6_K-GGUF --hf-file bellatrix-tiny-3b-r1-q6_k.gguf -p "The meaning to life and the universe is"
```
### Server:
```bash
llama-server --hf-repo Triangle104/Bellatrix-Tiny-3B-R1-Q6_K-GGUF --hf-file bellatrix-tiny-3b-r1-q6_k.gguf -c 2048
```
Note: You can also use this checkpoint directly through the [usage steps](https://github.com/ggerganov/llama.cpp?tab=readme-ov-file#usage) listed in the Llama.cpp repo as well.
Step 1: Clone llama.cpp from GitHub.
```
git clone https://github.com/ggerganov/llama.cpp
```
Step 2: Move into the llama.cpp folder and build it with `LLAMA_CURL=1` flag along with other hardware-specific flags (for ex: LLAMA_CUDA=1 for Nvidia GPUs on Linux).
```
cd llama.cpp && LLAMA_CURL=1 make
```
Step 3: Run inference through the main binary.
```
./llama-cli --hf-repo Triangle104/Bellatrix-Tiny-3B-R1-Q6_K-GGUF --hf-file bellatrix-tiny-3b-r1-q6_k.gguf -p "The meaning to life and the universe is"
```
or
```
./llama-server --hf-repo Triangle104/Bellatrix-Tiny-3B-R1-Q6_K-GGUF --hf-file bellatrix-tiny-3b-r1-q6_k.gguf -c 2048
```
|