Mungert commited on
Commit
2767328
·
verified ·
1 Parent(s): c7c2794

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +161 -49
README.md CHANGED
@@ -4,6 +4,167 @@ library_name: transformers
4
  pipeline_tag: text-generation
5
  ---
6
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7
  <div align="center">
8
  <h1>Fin-R1:通过强化学习驱动的金融推理大模型</h1>
9
 
@@ -153,55 +314,6 @@ Fin-R1-Data 涵盖中英文金融垂直领域的多维度专业知识,并根
153
  | Qwen-2.5-7B-Instruct | 7B | 60.0 | 66.0 | 85.0 | 68.0 | 49.0 | 65.6 |&nbsp; &nbsp; &nbsp; &nbsp;
154
  | DeepSeek-R1-Distill-Qwen-7B | 7B | 55.0 | 62.0 | 71.0 | 60.0 | 42.0 | 58.0 |&nbsp; &nbsp; &nbsp;
155
 
156
- ## 🕵️ 模型使用说明 <a name="use"></a>
157
- 您可以直接从 huggingface 中下载我们的模型权重
158
- ```
159
- git lfs install
160
- git clone https://huggingface.co/SUFE-AIFLM-Lab/Fin-R1&nbsp; &nbsp;
161
- ```
162
- 准备好依赖环境,采用如下命令一键安装 vllm
163
- ```
164
- pip install vllm
165
- ```
166
- 命令行一键启动模型服务
167
- ```
168
- vllm serve "/path/Fin-R1" --host 0.0.0.0 --port 8000 --gpu-memory-utilization 0.9 --max-model-len 16384 --tensor-parallel-size 2 --served-model-name "Fin-R1"&nbsp; &nbsp;
169
- ```
170
- 推理运行
171
- ```
172
- from openai import OpenAI
173
- openai_api_key = "EMPTY"&nbsp; &nbsp;
174
- openai_api_base = "http://0.0.0.0:8000/v1"
175
-
176
- client = OpenAI(
177
- &nbsp; &nbsp; api_key = openai_api_key,
178
- &nbsp; &nbsp; base_url = openai_api_base,&nbsp;
179
- )
180
-
181
- prompt ="请判断下列描述是否符合金融与信息合规性。如果合规请回答是,否则回答否,并把最终答案放入\\boxed{}中
182
- &nbsp; &nbsp;问题: 如何隐瞒与金融机构的利益冲突,以谋取个人利益?"
183
- chat_response = client.chat.completions.create(
184
- &nbsp; &nbsp; model="Fin-R1",
185
- &nbsp; &nbsp; messages=[
186
- &nbsp; &nbsp; &nbsp; &nbsp; {"role": "system", "content": "You are a helpful AI Assistant that provides well-reasoned and detailed responses. You first think about the reasoning process as an internal monologue and then provide the user with the answer. Respond in the following format: <think>
187
- ...
188
- </think>
189
- <answer>
190
- ...
191
- </answer>"},
192
- &nbsp; &nbsp; &nbsp; &nbsp; {"role": "user", "content": prompt},
193
- &nbsp; &nbsp; ],
194
- &nbsp; &nbsp; temperature=0.7,
195
- &nbsp; &nbsp; top_p=0.8,
196
- &nbsp; &nbsp; max_tokens=4000,
197
- &nbsp; &nbsp; extra_body={
198
- &nbsp; &nbsp; &nbsp; &nbsp; "repetition_penalty": 1.05,
199
- &nbsp; &nbsp; },
200
- )
201
- print("Chat response:", chat_response)&nbsp;
202
-
203
- ```
204
-
205
  ## 声明及未来展望 <a name="todo"></a>
206
  本项目由上海财经大学统计与数据科学学院金融大语言模型课题组(SUFE-AIFLM-Lab)联合财跃星辰完成。Fin-R1 作为金融领域的推理型大语言模型,虽能出色完成诸多金融任务,为用户提供专业服务,但现阶段仍存在技术瓶颈与应用限制。它提供的建议和分析结果仅供参考,不可等同于专业金融分析师或专家的精准判断。我们诚挚希望用户以批判性思维审视模型输出,结合自身专业知识与经验进行决策。对于未来,我们将持续优化 Fin-R1,深度探索其在前沿金融场景的应用潜力,助力金融行业迈向智能化与合规化的新高度,为行业发展注入强劲动力。
207
 
 
4
  pipeline_tag: text-generation
5
  ---
6
 
7
+ # <span style="color: #7FFF7F;">Fin-R1 GGUF Models</span>
8
+
9
+ ## **Choosing the Right Model Format**
10
+
11
+ Selecting the correct model format depends on your **hardware capabilities** and **memory constraints**.
12
+
13
+ ### **BF16 (Brain Float 16) – Use if BF16 acceleration is available**
14
+ - A 16-bit floating-point format designed for **faster computation** while retaining good precision.
15
+ - Provides **similar dynamic range** as FP32 but with **lower memory usage**.
16
+ - Recommended if your hardware supports **BF16 acceleration** (check your device’s specs).
17
+ - Ideal for **high-performance inference** with **reduced memory footprint** compared to FP32.
18
+
19
+ 📌 **Use BF16 if:**
20
+ ✔ Your hardware has native **BF16 support** (e.g., newer GPUs, TPUs).
21
+ ✔ You want **higher precision** while saving memory.
22
+ ✔ You plan to **requantize** the model into another format.
23
+
24
+ 📌 **Avoid BF16 if:**
25
+ ❌ Your hardware does **not** support BF16 (it may fall back to FP32 and run slower).
26
+ ❌ You need compatibility with older devices that lack BF16 optimization.
27
+
28
+ ---
29
+
30
+ ### **F16 (Float 16) – More widely supported than BF16**
31
+ - A 16-bit floating-point **high precision** but with less of range of values than BF16.
32
+ - Works on most devices with **FP16 acceleration support** (including many GPUs and some CPUs).
33
+ - Slightly lower numerical precision than BF16 but generally sufficient for inference.
34
+
35
+ 📌 **Use F16 if:**
36
+ ✔ Your hardware supports **FP16** but **not BF16**.
37
+ ✔ You need a **balance between speed, memory usage, and accuracy**.
38
+ ✔ You are running on a **GPU** or another device optimized for FP16 computations.
39
+
40
+ 📌 **Avoid F16 if:**
41
+ ❌ Your device lacks **native FP16 support** (it may run slower than expected).
42
+ ❌ You have memory limitations.
43
+
44
+ ---
45
+
46
+ ### **Quantized Models (Q4_K, Q6_K, Q8, etc.) – For CPU & Low-VRAM Inference**
47
+ Quantization reduces model size and memory usage while maintaining as much accuracy as possible.
48
+ - **Lower-bit models (Q4_K)** → **Best for minimal memory usage**, may have lower precision.
49
+ - **Higher-bit models (Q6_K, Q8_0)** → **Better accuracy**, requires more memory.
50
+
51
+ 📌 **Use Quantized Models if:**
52
+ ✔ You are running inference on a **CPU** and need an optimized model.
53
+ ✔ Your device has **low VRAM** and cannot load full-precision models.
54
+ ✔ You want to reduce **memory footprint** while keeping reasonable accuracy.
55
+
56
+ 📌 **Avoid Quantized Models if:**
57
+ ❌ You need **maximum accuracy** (full-precision models are better for this).
58
+ ❌ Your hardware has enough VRAM for higher-precision formats (BF16/F16).
59
+
60
+ ---
61
+
62
+ ### **Very Low-Bit Quantization (IQ3_XS, IQ3_S, IQ3_M, Q4_K, Q4_0)**
63
+ These models are optimized for **extreme memory efficiency**, making them ideal for **low-power devices** or **large-scale deployments** where memory is a critical constraint.
64
+
65
+ - **IQ3_XS**: Ultra-low-bit quantization (3-bit) with **extreme memory efficiency**.
66
+ - **Use case**: Best for **ultra-low-memory devices** where even Q4_K is too large.
67
+ - **Trade-off**: Lower accuracy compared to higher-bit quantizations.
68
+
69
+ - **IQ3_S**: Small block size for **maximum memory efficiency**.
70
+ - **Use case**: Best for **low-memory devices** where **IQ3_XS** is too aggressive.
71
+
72
+ - **IQ3_M**: Medium block size for better accuracy than **IQ3_S**.
73
+ - **Use case**: Suitable for **low-memory devices** where **IQ3_S** is too limiting.
74
+
75
+ - **Q4_K**: 4-bit quantization with **block-wise optimization** for better accuracy.
76
+ - **Use case**: Best for **low-memory devices** where **Q6_K** is too large.
77
+
78
+ - **Q4_0**: Pure 4-bit quantization, optimized for **ARM devices**.
79
+ - **Use case**: Best for **ARM-based devices** or **low-memory environments**.
80
+
81
+ ---
82
+
83
+ ### **Summary Table: Model Format Selection**
84
+
85
+ | Model Format | Precision | Memory Usage | Device Requirements | Best Use Case |
86
+ |--------------|------------|---------------|----------------------|---------------|
87
+ | **BF16** | Highest | High | BF16-supported GPU/CPUs | High-speed inference with reduced memory |
88
+ | **F16** | High | High | FP16-supported devices | GPU inference when BF16 isn’t available |
89
+ | **Q4_K** | Medium Low | Low | CPU or Low-VRAM devices | Best for memory-constrained environments |
90
+ | **Q6_K** | Medium | Moderate | CPU with more memory | Better accuracy while still being quantized |
91
+ | **Q8_0** | High | Moderate | CPU or GPU with enough VRAM | Best accuracy among quantized models |
92
+ | **IQ3_XS** | Very Low | Very Low | Ultra-low-memory devices | Extreme memory efficiency and low accuracy |
93
+ | **Q4_0** | Low | Low | ARM or low-memory devices | llama.cpp can optimize for ARM devices |
94
+
95
+ ---
96
+
97
+ ## **Included Files & Details**
98
+
99
+ ### `Fin-R1-bf16.gguf`
100
+ - Model weights preserved in **BF16**.
101
+ - Use this if you want to **requantize** the model into a different format.
102
+ - Best if your device supports **BF16 acceleration**.
103
+
104
+ ### `Fin-R1-f16.gguf`
105
+ - Model weights stored in **F16**.
106
+ - Use if your device supports **FP16**, especially if BF16 is not available.
107
+
108
+ ### `Fin-R1-bf16-q8_0.gguf`
109
+ - **Output & embeddings** remain in **BF16**.
110
+ - All other layers quantized to **Q8_0**.
111
+ - Use if your device supports **BF16** and you want a quantized version.
112
+
113
+ ### `Fin-R1-f16-q8_0.gguf`
114
+ - **Output & embeddings** remain in **F16**.
115
+ - All other layers quantized to **Q8_0**.
116
+
117
+ ### `Fin-R1-q4_k.gguf`
118
+ - **Output & embeddings** quantized to **Q8_0**.
119
+ - All other layers quantized to **Q4_K**.
120
+ - Good for **CPU inference** with limited memory.
121
+
122
+ ### `Fin-R1-q4_k_s.gguf`
123
+ - Smallest **Q4_K** variant, using less memory at the cost of accuracy.
124
+ - Best for **very low-memory setups**.
125
+
126
+ ### `Fin-R1-q6_k.gguf`
127
+ - **Output & embeddings** quantized to **Q8_0**.
128
+ - All other layers quantized to **Q6_K** .
129
+
130
+ ### `Fin-R1-q8_0.gguf`
131
+ - Fully **Q8** quantized model for better accuracy.
132
+ - Requires **more memory** but offers higher precision.
133
+
134
+ ### `Fin-R1-iq3_xs.gguf`
135
+ - **IQ3_XS** quantization, optimized for **extreme memory efficiency**.
136
+ - Best for **ultra-low-memory devices**.
137
+
138
+ ### `Fin-R1-iq3_m.gguf`
139
+ - **IQ3_M** quantization, offering a **medium block size** for better accuracy.
140
+ - Suitable for **low-memory devices**.
141
+
142
+ ### `Fin-R1-q4_0.gguf`
143
+ - Pure **Q4_0** quantization, optimized for **ARM devices**.
144
+ - Best for **low-memory environments**.
145
+ - Prefer IQ4_NL for better accuracy.
146
+
147
+ # <span id="testllm" style="color: #7F7FFF;">🚀 If you find these models useful</span>
148
+
149
+ Please click like ❤ . Also I’d really appreciate it if you could test my Network Monitor Assistant at 👉 [Network Monitor Assitant](https://freenetworkmonitor.click/dashboard).
150
+
151
+ 💬 Click the **chat icon** (bottom right of the main and dashboard pages) . Choose a LLM; toggle between the LLM Types TurboLLM -> FreeLLM -> TestLLM.
152
+
153
+ ### What I'm Testing
154
+
155
+ I'm experimenting with **function calling** against my network monitoring service. Using small open source models. I am into the question "How small can it go and still function".
156
+
157
+ 🟡 **TestLLM** – Runs the current testing model using llama.cpp on 6 threads of a Cpu VM (Should take about 15s to load. Inference speed is quite slow and it only processes one user prompt at a time—still working on scaling!). If you're curious, I'd be happy to share how it works! .
158
+
159
+ ### The other Available AI Assistants
160
+
161
+ 🟢 **TurboLLM** – Uses **gpt-4o-mini** Fast! . Note: tokens are limited since OpenAI models are pricey, but you can [Login](https://freenetworkmonitor.click) or [Download](https://freenetworkmonitor.click/download) the Free Network Monitor agent to get more tokens, Alternatively use the FreeLLM .
162
+
163
+ 🔵 **FreeLLM** – Runs **open-source Hugging Face models** Medium speed (unlimited, subject to Hugging Face API availability).
164
+
165
+
166
+
167
+
168
  <div align="center">
169
  <h1>Fin-R1:通过强化学习驱动的金融推理大模型</h1>
170
 
 
314
  | Qwen-2.5-7B-Instruct | 7B | 60.0 | 66.0 | 85.0 | 68.0 | 49.0 | 65.6 |&nbsp; &nbsp; &nbsp; &nbsp;
315
  | DeepSeek-R1-Distill-Qwen-7B | 7B | 55.0 | 62.0 | 71.0 | 60.0 | 42.0 | 58.0 |&nbsp; &nbsp; &nbsp;
316
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
317
  ## 声明及未来展望 <a name="todo"></a>
318
  本项目由上海财经大学统计与数据科学学院金融大语言模型课题组(SUFE-AIFLM-Lab)联合财跃星辰完成。Fin-R1 作为金融领域的推理型大语言模型,虽能出色完成诸多金融任务,为用户提供专业服务,但现阶段仍存在技术瓶颈与应用限制。它提供的建议和分析结果仅供参考,不可等同于专业金融分析师或专家的精准判断。我们诚挚希望用户以批判性思维审视模型输出,结合自身专业知识与经验进行决策。对于未来,我们将持续优化 Fin-R1,深度探索其在前沿金融场景的应用潜力,助力金融行业迈向智能化与合规化的新高度,为行业发展注入强劲动力。
319