ruslanmv commited on
Commit
51a569a
·
verified ·
1 Parent(s): f6c70a3

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +115 -9
README.md CHANGED
@@ -1,22 +1,128 @@
1
  ---
2
  base_model: ibm-granite/granite-3.1-8b-instruct
3
  tags:
4
- - text-generation-inference
5
  - transformers
6
- - unsloth
7
- - granite
8
  - gguf
 
 
 
 
 
 
 
 
 
9
  license: apache-2.0
10
  language:
11
  - en
12
  ---
13
 
14
- # Uploaded model
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
15
 
16
- - **Developed by:** ruslanmv
17
- - **License:** apache-2.0
18
- - **Finetuned from model :** ibm-granite/granite-3.1-8b-instruct
19
 
20
- This granite model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
21
 
22
- [<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)
 
 
 
 
 
 
 
 
1
  ---
2
  base_model: ibm-granite/granite-3.1-8b-instruct
3
  tags:
4
+ - text-generation
5
  - transformers
 
 
6
  - gguf
7
+ - english
8
+ - granite
9
+ - text-generation-inference
10
+ - inference-endpoints
11
+ - conversational
12
+ - 4-bit
13
+ - 5-bit
14
+ - 8-bit
15
+ - ruslanmv
16
  license: apache-2.0
17
  language:
18
  - en
19
  ---
20
 
21
+ # Granite-3.1-8B-Reasoning-GGUF (Quantized for Efficient Inference)
22
+
23
+ ## Model Overview
24
+
25
+ This is a **GGUF quantized version** of **ruslanmv/granite-3.1-8b-Reasoning**, fine-tuned from **ibm-granite/granite-3.1-8b-instruct**. The **GGUF format** enables efficient inference on **CPUs and GPUs**, optimized for various **K-bit quantization levels** (4-bit, 5-bit, and 8-bit).
26
+
27
+ - **Developed by:** [ruslanmv](https://huggingface.co/ruslanmv)
28
+ - **License:** Apache 2.0
29
+ - **Base Model:** [ibm-granite/granite-3.1-8b-instruct](https://huggingface.co/ibm-granite/granite-3.1-8b-instruct)
30
+ - **Fine-tuned for:** Logical reasoning, structured problem-solving, long-context tasks
31
+ - **Quantized GGUF versions available:**
32
+ - **4-bit:** `Q4_K_M`
33
+ - **5-bit:** `Q5_K_M`
34
+ - **8-bit:** `Q8_0`
35
+ - **Supported Languages:** English
36
+ - **Architecture:** **Granite**
37
+ - **Model Size:** **8.17B params**
38
+
39
+ ---
40
+
41
+ ## Why Use the GGUF Quantized Version?
42
+
43
+ The **GGUF format** is designed for optimized **CPU and GPU inference**, making it ideal for:
44
+
45
+ ✅ **Lower memory usage** for efficient deployment
46
+ ✅ **Faster inference speeds** on consumer hardware
47
+ ✅ **Compatibility with leading inference engines** like **llama.cpp, ctransformers, and KoboldCpp**
48
+ ✅ **Improved performance on logical reasoning and analytical tasks**
49
+
50
+ ---
51
+
52
+ ## Installation & Usage
53
+
54
+ ### Install dependencies for **llama.cpp**:
55
+
56
+ ```bash
57
+ pip install llama-cpp-python
58
+ ```
59
+
60
+ ### Running the Model with **llama.cpp**:
61
+
62
+ ```python
63
+ from llama_cpp import Llama
64
+
65
+ model_path = "path/to/ruslanmv/granite-3.1-8b-Reasoning-GGUF.Q4_K_M.gguf"
66
+
67
+ llm = Llama(model_path=model_path)
68
+
69
+ input_text = "Can you explain the difference between inductive and deductive reasoning?"
70
+ output = llm(input_text, max_tokens=400)
71
+
72
+ print(output["choices"][0]["text"])
73
+ ```
74
+
75
+ ### Alternatively, using **ctransformers**:
76
+
77
+ ```bash
78
+ pip install ctransformers
79
+ ```
80
+
81
+ ```python
82
+ from ctransformers import AutoModelForCausalLM
83
+
84
+ model_path = "path/to/ruslanmv/granite-3.1-8b-Reasoning-GGUF.Q4_K_M.gguf"
85
+
86
+ model = AutoModelForCausalLM.from_pretrained(model_path, model_type="llama", gpu_layers=50)
87
+
88
+ input_text = "What are the key principles of logical reasoning?"
89
+ output = model(input_text, max_new_tokens=400)
90
+
91
+ print(output)
92
+ ```
93
+
94
+ ---
95
+
96
+ ## Intended Use
97
+
98
+ Granite-3.1-8B-Reasoning-GGUF is designed for **efficient inference** while maintaining strong **reasoning capabilities**, making it ideal for:
99
+
100
+ - **Logical and analytical problem-solving**
101
+ - **Text-based reasoning tasks**
102
+ - **Mathematical and symbolic reasoning**
103
+ - **Advanced instruction-following**
104
+
105
+ This model is particularly beneficial for **CPU-based deployments**, **low-memory environments**, and users who need **optimized text generation without requiring high-end GPUs**.
106
+
107
+ ---
108
+
109
+ ## License & Acknowledgments
110
+
111
+ This model is released under the **Apache 2.0** license. It is fine-tuned from IBM’s **Granite 3.1-8B-Instruct** model and **quantized using GGUF** for optimal efficiency. Special thanks to the **IBM Granite Team** for developing the base model.
112
+
113
+ For more details, visit the [IBM Granite Documentation](https://huggingface.co/ibm-granite).
114
+
115
+ ---
116
 
117
+ ### Citation
 
 
118
 
119
+ If you use this model in your research or applications, please cite:
120
 
121
+ ```
122
+ @misc{ruslanmv2025granite,
123
+ title={Fine-Tuning and GGUF Quantization of Granite-3.1-8B for Advanced Reasoning},
124
+ author={Ruslan M.V.},
125
+ year={2025},
126
+ url={https://huggingface.co/ruslanmv/granite-3.1-8b-Reasoning-GGUF}
127
+ }
128
+ ```