tetf
/

GGUF
conversational
ifTNT commited on
Commit
4d48258
·
verified ·
1 Parent(s): d156f3e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +53 -3
README.md CHANGED
@@ -7,7 +7,57 @@ base_model: taide/Llama-3.1-TAIDE-LX-8B-Chat
7
  base_model_relation: quantized
8
  ---
9
 
10
- This repository contains quantized models of [taide/Llama-3.1-TAIDE-LX-8B-Chat](https://huggingface.co/taide/Llama-3.1-TAIDE-LX-8B-Chat) using [llama.cpp](https://github.com/ggml-org/llama.cpp) version b4739.
11
 
12
- ## Instructions to run this model in Kuwa
13
- TODO
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7
  base_model_relation: quantized
8
  ---
9
 
10
+ 基於 [taide/Llama-3.1-TAIDE-LX-8B-Chat](https://huggingface.co/taide/Llama-3.1-TAIDE-LX-8B-Chat) 使用 [llama.cpp](https://github.com/ggml-org/llama.cpp) b4739 版量化過的模型。
11
 
12
+ ## 如何在 Kuwa 中執行量化後的 TAIDE 模型
13
+
14
+ ### Windows 版
15
+
16
+ 1. 安裝 [Kuwa v0.3.4 Windows版安裝檔](https://kuwaai.org/zh-Hant/release/kuwa-os-v0.3.4)
17
+ 2. 進入 `C:\kuwa\GenAI OS\windows\executors\taide` 目錄下,並將原始模型`taide-8b-a.3-q4_k_m.gguf`備份到此目錄外,並刪除`run.bat`
18
+ 3. 從[tetf/Llama-3.1-TAIDE-LX-8B-Chat-GGUF](https://huggingface.co/tetf/Llama-3.1-TAIDE-LX-8B-Chat-GGUF)下載 `Llama-3.1-TAIDE-LX-8B-Chat-Q4_K_M.gguf` 到 `C:\kuwa\GenAI OS\windows\executors\taide` 目錄中
19
+ 4. 執行 `C:\kuwa\GenAI OS\windows\executors\taide`,使用以下設定值
20
+ - Enter the option number (1-5): `3`
21
+ - Enter the model name: `Llama-3.1 TAIDE LX-8B Chat Q4_K_M`
22
+ - Enter the access code: `llama-3.1-taide-lx-8b-chat-q4_k_m`
23
+ - Arguments to use (...): `--stop "<|eot_id|>"`
24
+ 5. 重新啟動 Kuwa 即可看到新版的 TAIDE 模型
25
+
26
+ ### Docker 版
27
+
28
+ 1. 下載 [Kuwa v0.3.4 原始碼](https://github.com/kuwaai/genai-os/tree/main)
29
+ 2. 參考[安裝文件](https://github.com/kuwaai/genai-os/blob/main/docs/docker_quick_installation.md)安裝 Kuwa,也可以參考[社群貢獻的手冊](https://kuwaai.org/zh-Hant/os/User%20Manual)
30
+ 3. 新增 `genai-os/docker/compose/sample/taide-llamacpp.yaml`設定檔,填入以下內容
31
+ ```yaml
32
+ services:
33
+ llamacpp-executor:
34
+ image: kuwaai/model-executor
35
+ environment:
36
+ EXECUTOR_TYPE: llamacpp
37
+ EXECUTOR_ACCESS_CODE: llama-3.1-taide-lx-8b-chat-q4_k_m
38
+ EXECUTOR_NAME: Llama-3.1 TAIDE LX-8B Chat Q4_K_M
39
+ EXECUTOR_IMAGE: llamacpp.png # Refer to src/multi-chat/public/images
40
+ depends_on:
41
+ - executor-builder
42
+ - kernel
43
+ - multi-chat
44
+ command: ["--model_path", "/var/model/Llama-3.1-TAIDE-LX-8B-Chat-Q4_K_M.gguf", "--temperature", "0"]
45
+ # or use GPU
46
+ # command: ["--model_path", "/var/model/Llama-3.1-TAIDE-LX-8B-Chat-Q4_K_M.gguf", "--ngl", "-1", "--temperature", "0"]
47
+ restart: unless-stopped
48
+ volumes: ["/path/to/Llama-3.1-TAIDE-LX-8B-Chat-Q4_K_M.gguf:/var/model/Llama-3.1-TAIDE-LX-8B-Chat-Q4_K_M.gguf"] # Remember to change path
49
+ # Uncomment to use GPU
50
+ # deploy:
51
+ # resources:
52
+ # reservations:
53
+ # devices:
54
+ # - driver: nvidia
55
+ # device_ids: ['0']
56
+ # capabilities: [gpu]
57
+ networks: ["backend"]
58
+ ```
59
+ 4. 在 `genai-os/docker` 目錄下執行 `./run.sh` 即可啟動新模型
60
+
61
+ # 如何量化
62
+
63
+ 關於量化Llama-3.1-TAIDE-LX-8B-Chat的方法與挑戰可以參考San-Li Hsu的筆記["使用llama.cpp將Hugging Face模型權重(safetensors)轉換成GGUF並進行量化"](https://hackmd.io/@San-Li/S1FEOWk9kg)。