ruslanmv commited on
Commit
46df940
·
verified ·
1 Parent(s): 7871688

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +116 -9
README.md CHANGED
@@ -1,22 +1,129 @@
1
  ---
2
  base_model: ibm-granite/granite-3.1-2b-instruct
3
  tags:
4
- - text-generation-inference
5
  - transformers
6
- - unsloth
7
- - granite
8
  - gguf
 
 
 
 
 
 
 
 
 
9
  license: apache-2.0
10
  language:
11
  - en
12
  ---
13
 
14
- # Uploaded model
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
15
 
16
- - **Developed by:** ruslanmv
17
- - **License:** apache-2.0
18
- - **Finetuned from model :** ibm-granite/granite-3.1-2b-instruct
19
 
20
- This granite model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
21
 
22
- [<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)
 
 
 
 
 
 
 
 
1
  ---
2
  base_model: ibm-granite/granite-3.1-2b-instruct
3
  tags:
4
+ - text-generation
5
  - transformers
 
 
6
  - gguf
7
+ - english
8
+ - granite
9
+ - text-generation-inference
10
+ - inference-endpoints
11
+ - conversational
12
+ - 4-bit
13
+ - 5-bit
14
+ - 8-bit
15
+ - ruslanmv
16
  license: apache-2.0
17
  language:
18
  - en
19
  ---
20
 
21
+ # Granite-3.1-2B-Reasoning-GGUF (Quantized for Efficiency)
22
+
23
+ ## Model Overview
24
+
25
+ This is a **GGUF quantized version** of **ruslanmv/granite-3.1-2b-Reasoning**, fine-tuned from **ibm-granite/granite-3.1-2b-instruct**. The **GGUF format** allows for efficient inference on **CPU and GPU**, optimized for use with **Kbit quantization levels** (4-bit, 5-bit, and 8-bit).
26
+
27
+ - **Developed by:** [ruslanmv](https://huggingface.co/ruslanmv)
28
+ - **License:** Apache 2.0
29
+ - **Base Model:** [ibm-granite/granite-3.1-2b-instruct](https://huggingface.co/ibm-granite/granite-3.1-2b-instruct)
30
+ - **Fine-tuned for:** Logical reasoning, structured problem-solving, long-context tasks
31
+ - **Quantized GGUF versions available:**
32
+ - **4-bit:** `Q4_K_M`
33
+ - **5-bit:** `Q5_K_M`
34
+ - **8-bit:** `Q8_0`
35
+ - **Supported Languages:** English
36
+ - **Architecture:** **Granite**
37
+ - **Model Size:** **2.53B params**
38
+
39
+ ---
40
+
41
+ ## Why Use the GGUF Quantized Version?
42
+
43
+ The **GGUF format** is designed for optimized **CPU and GPU inference**, enabling:
44
+
45
+ ✅ **Lower memory usage** for running on consumer hardware
46
+ ✅ **Faster inference speeds** without compromising reasoning ability
47
+ ✅ **Compatibility with popular inference engines** like llama.cpp, ctransformers, and KoboldCpp
48
+
49
+ ---
50
+
51
+ ## Installation & Usage
52
+
53
+ To use this model with **llama.cpp**, install the required dependencies:
54
+
55
+ ```bash
56
+ pip install llama-cpp-python
57
+ ```
58
+
59
+ ### Running the Model
60
+
61
+ To run the model using **llama.cpp**:
62
+
63
+ ```bash
64
+ from llama_cpp import Llama
65
+
66
+ model_path = "path/to/ruslanmv/granite-3.1-2b-Reasoning-GGUF.Q4_K_M.gguf"
67
+
68
+ llm = Llama(model_path=model_path)
69
+
70
+ input_text = "Can you explain the difference between inductive and deductive reasoning?"
71
+ output = llm(input_text, max_tokens=400)
72
+
73
+ print(output["choices"][0]["text"])
74
+ ```
75
+
76
+ Alternatively, using **ctransformers**:
77
+
78
+ ```bash
79
+ pip install ctransformers
80
+ ```
81
+
82
+ ```python
83
+ from ctransformers import AutoModelForCausalLM
84
+
85
+ model_path = "path/to/ruslanmv/granite-3.1-2b-Reasoning-GGUF.Q4_K_M.gguf"
86
+
87
+ model = AutoModelForCausalLM.from_pretrained(model_path, model_type="llama", gpu_layers=50)
88
+
89
+ input_text = "What are the key principles of logical reasoning?"
90
+ output = model(input_text, max_new_tokens=400)
91
+
92
+ print(output)
93
+ ```
94
+
95
+ ---
96
+
97
+ ## Intended Use
98
+
99
+ Granite-3.1-2B-Reasoning-GGUF is optimized for **efficient inference** while maintaining strong **reasoning capabilities**, making it ideal for:
100
+
101
+ - **Logical and analytical problem-solving**
102
+ - **Text-based reasoning tasks**
103
+ - **Mathematical and symbolic reasoning**
104
+ - **Advanced instruction-following**
105
+
106
+ This model is particularly useful for **CPU-based deployments** and users who need **low-memory, high-performance** text generation.
107
+
108
+ ---
109
+
110
+ ## License & Acknowledgments
111
+
112
+ This model is released under the **Apache 2.0** license. It is fine-tuned from IBM’s **Granite 3.1-2B-Instruct** model and **quantized using GGUF** for optimal efficiency. Special thanks to the **IBM Granite Team** for developing the base model.
113
+
114
+ For more details, visit the [IBM Granite Documentation](https://huggingface.co/ibm-granite).
115
+
116
+ ---
117
 
118
+ ### Citation
 
 
119
 
120
+ If you use this model in your research or applications, please cite:
121
 
122
+ ```
123
+ @misc{ruslanmv2025granite,
124
+ title={Fine-Tuning and GGUF Quantization of Granite-3.1 for Advanced Reasoning},
125
+ author={Ruslan M.V.},
126
+ year={2025},
127
+ url={https://huggingface.co/ruslanmv/granite-3.1-2b-Reasoning-GGUF}
128
+ }
129
+ ```