Update README.md
Browse files
README.md
CHANGED
@@ -7,7 +7,57 @@ base_model: taide/Llama-3.1-TAIDE-LX-8B-Chat
|
|
7 |
base_model_relation: quantized
|
8 |
---
|
9 |
|
10 |
-
|
11 |
|
12 |
-
##
|
13 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
7 |
base_model_relation: quantized
|
8 |
---
|
9 |
|
10 |
+
基於 [taide/Llama-3.1-TAIDE-LX-8B-Chat](https://huggingface.co/taide/Llama-3.1-TAIDE-LX-8B-Chat) 使用 [llama.cpp](https://github.com/ggml-org/llama.cpp) b4739 版量化過的模型。
|
11 |
|
12 |
+
## 如何在 Kuwa 中執行量化後的 TAIDE 模型
|
13 |
+
|
14 |
+
### Windows 版
|
15 |
+
|
16 |
+
1. 安裝 [Kuwa v0.3.4 Windows版安裝檔](https://kuwaai.org/zh-Hant/release/kuwa-os-v0.3.4)
|
17 |
+
2. 進入 `C:\kuwa\GenAI OS\windows\executors\taide` 目錄下,並將原始模型`taide-8b-a.3-q4_k_m.gguf`備份到此目錄外,並刪除`run.bat`
|
18 |
+
3. 從[tetf/Llama-3.1-TAIDE-LX-8B-Chat-GGUF](https://huggingface.co/tetf/Llama-3.1-TAIDE-LX-8B-Chat-GGUF)下載 `Llama-3.1-TAIDE-LX-8B-Chat-Q4_K_M.gguf` 到 `C:\kuwa\GenAI OS\windows\executors\taide` 目錄中
|
19 |
+
4. 執行 `C:\kuwa\GenAI OS\windows\executors\taide`,使用以下設定值
|
20 |
+
- Enter the option number (1-5): `3`
|
21 |
+
- Enter the model name: `Llama-3.1 TAIDE LX-8B Chat Q4_K_M`
|
22 |
+
- Enter the access code: `llama-3.1-taide-lx-8b-chat-q4_k_m`
|
23 |
+
- Arguments to use (...): `--stop "<|eot_id|>"`
|
24 |
+
5. 重新啟動 Kuwa 即可看到新版的 TAIDE 模型
|
25 |
+
|
26 |
+
### Docker 版
|
27 |
+
|
28 |
+
1. 下載 [Kuwa v0.3.4 原始碼](https://github.com/kuwaai/genai-os/tree/main)
|
29 |
+
2. 參考[安裝文件](https://github.com/kuwaai/genai-os/blob/main/docs/docker_quick_installation.md)安裝 Kuwa,也可以參考[社群貢獻的手冊](https://kuwaai.org/zh-Hant/os/User%20Manual)
|
30 |
+
3. 新增 `genai-os/docker/compose/sample/taide-llamacpp.yaml`設定檔,填入以下內容
|
31 |
+
```yaml
|
32 |
+
services:
|
33 |
+
llamacpp-executor:
|
34 |
+
image: kuwaai/model-executor
|
35 |
+
environment:
|
36 |
+
EXECUTOR_TYPE: llamacpp
|
37 |
+
EXECUTOR_ACCESS_CODE: llama-3.1-taide-lx-8b-chat-q4_k_m
|
38 |
+
EXECUTOR_NAME: Llama-3.1 TAIDE LX-8B Chat Q4_K_M
|
39 |
+
EXECUTOR_IMAGE: llamacpp.png # Refer to src/multi-chat/public/images
|
40 |
+
depends_on:
|
41 |
+
- executor-builder
|
42 |
+
- kernel
|
43 |
+
- multi-chat
|
44 |
+
command: ["--model_path", "/var/model/Llama-3.1-TAIDE-LX-8B-Chat-Q4_K_M.gguf", "--temperature", "0"]
|
45 |
+
# or use GPU
|
46 |
+
# command: ["--model_path", "/var/model/Llama-3.1-TAIDE-LX-8B-Chat-Q4_K_M.gguf", "--ngl", "-1", "--temperature", "0"]
|
47 |
+
restart: unless-stopped
|
48 |
+
volumes: ["/path/to/Llama-3.1-TAIDE-LX-8B-Chat-Q4_K_M.gguf:/var/model/Llama-3.1-TAIDE-LX-8B-Chat-Q4_K_M.gguf"] # Remember to change path
|
49 |
+
# Uncomment to use GPU
|
50 |
+
# deploy:
|
51 |
+
# resources:
|
52 |
+
# reservations:
|
53 |
+
# devices:
|
54 |
+
# - driver: nvidia
|
55 |
+
# device_ids: ['0']
|
56 |
+
# capabilities: [gpu]
|
57 |
+
networks: ["backend"]
|
58 |
+
```
|
59 |
+
4. 在 `genai-os/docker` 目錄下執行 `./run.sh` 即可啟動新模型
|
60 |
+
|
61 |
+
# 如何量化
|
62 |
+
|
63 |
+
關於量化Llama-3.1-TAIDE-LX-8B-Chat的方法與挑戰可以參考San-Li Hsu的筆記["使用llama.cpp將Hugging Face模型權重(safetensors)轉換成GGUF並進行量化"](https://hackmd.io/@San-Li/S1FEOWk9kg)。
|