File size: 4,906 Bytes
55ad551
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4593e39
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
55ad551
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
---
license: llama3.2
language:
- en
base_model: prithivMLmods/Bellatrix-Tiny-3B-R1
library_name: transformers
tags:
- trl
- llama3.2
- Reinforcement learning
- llama-cpp
- gguf-my-repo
---

# Triangle104/Bellatrix-Tiny-3B-R1-Q5_K_S-GGUF
This model was converted to GGUF format from [`prithivMLmods/Bellatrix-Tiny-3B-R1`](https://huggingface.co/prithivMLmods/Bellatrix-Tiny-3B-R1) using llama.cpp via the ggml.ai's [GGUF-my-repo](https://huggingface.co/spaces/ggml-org/gguf-my-repo) space.
Refer to the [original model card](https://huggingface.co/prithivMLmods/Bellatrix-Tiny-3B-R1) for more details on the model.

---
Bellatrix is based on a reasoning-based model designed for the DeepSeek-R1
 synthetic dataset entries. The pipeline's instruction-tuned, text-only 
models are optimized for multilingual dialogue use cases, including 
agentic retrieval and summarization tasks. These models outperform many 
of the available open-source options. Bellatrix is an auto-regressive 
language model that uses an optimized transformer architecture. The 
tuned versions utilize supervised fine-tuning (SFT) and reinforcement 
learning with human feedback (RLHF).  



	
		
	

		Use with transformers
	



Starting with transformers >= 4.43.0 onward, you can run conversational inference using the Transformers pipeline abstraction or by leveraging the Auto classes with the generate() function.  


Make sure to update your transformers installation via:  


pip install --upgrade transformers



import torch
from transformers import pipeline

model_id = "prithivMLmods/Bellatrix-Tiny-3B-R1"
pipe = pipeline(
    "text-generation",
    model=model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)
messages = [
    {"role": "system", "content": "You are a pirate chatbot who always responds in pirate speak!"},
    {"role": "user", "content": "Who are you?"},
]
outputs = pipe(
    messages,
    max_new_tokens=256,
)
print(outputs[0]["generated_text"][-1])



Note: You can also find detailed recipes on how to use the model locally, with torch.compile(), assisted generations, quantization, and more at huggingface-llama-recipes.  



	
		
	

		Intended Use
	



Bellatrix is designed for applications that require advanced 
reasoning and multilingual dialogue capabilities. It is particularly 
suitable for:  


Agentic Retrieval: Enabling intelligent retrieval of relevant information in a dialogue or query-response system.  
Summarization Tasks: Condensing large bodies of text into concise summaries for easier comprehension.  
Multilingual Use Cases: Supporting conversations in multiple languages with high accuracy and coherence.  
Instruction-Based Applications: Following complex, context-aware instructions to generate precise outputs in a variety of scenarios.



	
		
	

		Limitations
	



Despite its capabilities, Bellatrix has some limitations:  


Domain Specificity: While it performs well on general tasks, its performance may degrade with highly specialized or niche datasets.  
Dependence on Training Data: It is only as good as the quality and diversity of its training data, which may lead to biases or inaccuracies.  
Computational Resources: The model’s optimized 
transformer architecture can be resource-intensive, requiring 
significant computational power for fine-tuning and inference.  
Language Coverage: While multilingual, some languages or dialects may have limited support or lower performance compared to widely used ones.  
Real-World Contexts: It may struggle with understanding nuanced or ambiguous real-world scenarios not covered during training.

---
## Use with llama.cpp
Install llama.cpp through brew (works on Mac and Linux)

```bash
brew install llama.cpp

```
Invoke the llama.cpp server or the CLI.

### CLI:
```bash
llama-cli --hf-repo Triangle104/Bellatrix-Tiny-3B-R1-Q5_K_S-GGUF --hf-file bellatrix-tiny-3b-r1-q5_k_s.gguf -p "The meaning to life and the universe is"
```

### Server:
```bash
llama-server --hf-repo Triangle104/Bellatrix-Tiny-3B-R1-Q5_K_S-GGUF --hf-file bellatrix-tiny-3b-r1-q5_k_s.gguf -c 2048
```

Note: You can also use this checkpoint directly through the [usage steps](https://github.com/ggerganov/llama.cpp?tab=readme-ov-file#usage) listed in the Llama.cpp repo as well.

Step 1: Clone llama.cpp from GitHub.
```
git clone https://github.com/ggerganov/llama.cpp
```

Step 2: Move into the llama.cpp folder and build it with `LLAMA_CURL=1` flag along with other hardware-specific flags (for ex: LLAMA_CUDA=1 for Nvidia GPUs on Linux).
```
cd llama.cpp && LLAMA_CURL=1 make
```

Step 3: Run inference through the main binary.
```
./llama-cli --hf-repo Triangle104/Bellatrix-Tiny-3B-R1-Q5_K_S-GGUF --hf-file bellatrix-tiny-3b-r1-q5_k_s.gguf -p "The meaning to life and the universe is"
```
or 
```
./llama-server --hf-repo Triangle104/Bellatrix-Tiny-3B-R1-Q5_K_S-GGUF --hf-file bellatrix-tiny-3b-r1-q5_k_s.gguf -c 2048
```