File size: 3,245 Bytes
93cbcd9 d156f3e 4d48258 d156f3e 4d48258 2253cca 4d48258 25fd7ff 4d48258 25fd7ff 4d48258 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 |
---
license: other
license_name: taide-l-models-community-license-agreement
license_link: >-
https://drive.google.com/file/d/1zvRktQwKCJesePKLCMWO-ah23xkbIdFB/view?usp=sharing
base_model: taide/Llama-3.1-TAIDE-LX-8B-Chat
base_model_relation: quantized
---
基於 [taide/Llama-3.1-TAIDE-LX-8B-Chat](https://huggingface.co/taide/Llama-3.1-TAIDE-LX-8B-Chat) 使用 [llama.cpp](https://github.com/ggml-org/llama.cpp) b4739 版量化過的模型。
## 如何在 Kuwa 中執行量化後的 TAIDE 模型
### Windows 版
1. 安裝 [Kuwa v0.3.4 Windows版安裝檔](https://kuwaai.org/zh-Hant/release/kuwa-os-v0.3.4)
2. 進入 `C:\kuwa\GenAI OS\windows\executors\taide` 目錄下,並將原始模型`taide-8b-a.3-q4_k_m.gguf`備份到此目錄外,並刪除`run.bat`
3. 從[tetf/Llama-3.1-TAIDE-LX-8B-Chat-GGUF](https://huggingface.co/tetf/Llama-3.1-TAIDE-LX-8B-Chat-GGUF)下載 `Llama-3.1-TAIDE-LX-8B-Chat-Q4_K_M.gguf` 到 `C:\kuwa\GenAI OS\windows\executors\taide` 目錄中
4. 執行 `C:\kuwa\GenAI OS\windows\executors\taide\init.bat`,使用以下設定值
- Enter the option number (1-5): `3`
- Enter the model name: `Llama-3.1 TAIDE LX-8B Chat Q4_K_M`
- Enter the access code: `llama-3.1-taide-lx-8b-chat-q4_k_m`
- Arguments to use (...): `--stop "<|eot_id|>"`
5. 重新啟動 Kuwa 即可看到新版的 TAIDE 模型
### Docker 版
1. 下載 [Kuwa v0.3.4 原始碼](https://github.com/kuwaai/genai-os/tree/main)
2. 參考[安裝文件](https://github.com/kuwaai/genai-os/blob/main/docs/docker_quick_installation.md)安裝 Kuwa,也可以參考[社群貢獻的手冊](https://kuwaai.org/zh-Hant/os/User%20Manual)
3. 從[tetf/Llama-3.1-TAIDE-LX-8B-Chat-GGUF](https://huggingface.co/tetf/Llama-3.1-TAIDE-LX-8B-Chat-GGUF)下載 `Llama-3.1-TAIDE-LX-8B-Chat-Q4_K_M.gguf` 到任意目錄中
4. 新增 `genai-os/docker/compose/sample/taide-llamacpp.yaml`設定檔,填入以下內容
```yaml
services:
llamacpp-executor:
image: kuwaai/model-executor
environment:
EXECUTOR_TYPE: llamacpp
EXECUTOR_ACCESS_CODE: llama-3.1-taide-lx-8b-chat-q4_k_m
EXECUTOR_NAME: Llama-3.1 TAIDE LX-8B Chat Q4_K_M
EXECUTOR_IMAGE: llamacpp.png # Refer to src/multi-chat/public/images
depends_on:
- executor-builder
- kernel
- multi-chat
command: ["--model_path", "/var/model/Llama-3.1-TAIDE-LX-8B-Chat-Q4_K_M.gguf", "--temperature", "0"]
# or use GPU
# command: ["--model_path", "/var/model/Llama-3.1-TAIDE-LX-8B-Chat-Q4_K_M.gguf", "--ngl", "-1", "--temperature", "0"]
restart: unless-stopped
volumes: ["/path/to/Llama-3.1-TAIDE-LX-8B-Chat-Q4_K_M.gguf:/var/model/Llama-3.1-TAIDE-LX-8B-Chat-Q4_K_M.gguf"] # Remember to change path
# Uncomment to use GPU
# deploy:
# resources:
# reservations:
# devices:
# - driver: nvidia
# device_ids: ['0']
# capabilities: [gpu]
networks: ["backend"]
```
5. 在 `genai-os/docker` 目錄下執行 `./run.sh` 即可啟動新模型
# 如何量化
關於量化Llama-3.1-TAIDE-LX-8B-Chat的方法與挑戰可以參考San-Li Hsu的筆記["使用llama.cpp將Hugging Face模型權重(safetensors)轉換成GGUF並進行量化"](https://hackmd.io/@San-Li/S1FEOWk9kg)。
|