tetf
/

Llama-3.1-TAIDE-LX-8B-Chat-GGUF

GGUF

conversational

Model card Files Files and versions Community

ifTNT commited on Feb 21

Commit

4d48258

verified ·

1 Parent(s): d156f3e

Update README.md

Browse files

Files changed (1) hide show

README.md +53 -3

README.md CHANGED Viewed

@@ -7,7 +7,57 @@ base_model: taide/Llama-3.1-TAIDE-LX-8B-Chat
 base_model_relation: quantized
 ---
-This repository contains quantized models of [taide/Llama-3.1-TAIDE-LX-8B-Chat](https://huggingface.co/taide/Llama-3.1-TAIDE-LX-8B-Chat) using [llama.cpp](https://github.com/ggml-org/llama.cpp) version b4739.
-## Instructions to run this model in Kuwa
-TODO

 base_model_relation: quantized
 ---
+基於 [taide/Llama-3.1-TAIDE-LX-8B-Chat](https://huggingface.co/taide/Llama-3.1-TAIDE-LX-8B-Chat) 使用 [llama.cpp](https://github.com/ggml-org/llama.cpp) b4739 版量化過的模型。
+## 如何在 Kuwa 中執行量化後的 TAIDE 模型
+### Windows 版
+1. 安裝 [Kuwa v0.3.4 Windows版安裝檔](https://kuwaai.org/zh-Hant/release/kuwa-os-v0.3.4)
+2. 進入 `C:\kuwa\GenAI OS\windows\executors\taide` 目錄下，並將原始模型`taide-8b-a.3-q4_k_m.gguf`備份到此目錄外，並刪除`run.bat`
+3. 從[tetf/Llama-3.1-TAIDE-LX-8B-Chat-GGUF](https://huggingface.co/tetf/Llama-3.1-TAIDE-LX-8B-Chat-GGUF)下載 `Llama-3.1-TAIDE-LX-8B-Chat-Q4_K_M.gguf` 到 `C:\kuwa\GenAI OS\windows\executors\taide` 目錄中
+4. 執行 `C:\kuwa\GenAI OS\windows\executors\taide`，使用以下設定值
+    - Enter the option number (1-5): `3`
+    - Enter the model name: `Llama-3.1 TAIDE LX-8B Chat Q4_K_M`
+    - Enter the access code: `llama-3.1-taide-lx-8b-chat-q4_k_m`
+    - Arguments to use (...): `--stop "<|eot_id|>"`
+5. 重新啟動 Kuwa 即可看到新版的 TAIDE 模型
+### Docker 版
+1. 下載 [Kuwa v0.3.4 原始碼](https://github.com/kuwaai/genai-os/tree/main)
+2. 參考[安裝文件](https://github.com/kuwaai/genai-os/blob/main/docs/docker_quick_installation.md)安裝 Kuwa，也可以參考[社群貢獻的手冊](https://kuwaai.org/zh-Hant/os/User%20Manual)
+3. 新增 `genai-os/docker/compose/sample/taide-llamacpp.yaml`設定檔，填入以下內容
+  ```yaml
+  services:
+  llamacpp-executor:
+    image: kuwaai/model-executor
+    environment:
+      EXECUTOR_TYPE: llamacpp
+      EXECUTOR_ACCESS_CODE: llama-3.1-taide-lx-8b-chat-q4_k_m
+      EXECUTOR_NAME: Llama-3.1 TAIDE LX-8B Chat Q4_K_M
+      EXECUTOR_IMAGE: llamacpp.png # Refer to src/multi-chat/public/images
+    depends_on:
+      - executor-builder
+      - kernel
+      - multi-chat
+    command: ["--model_path", "/var/model/Llama-3.1-TAIDE-LX-8B-Chat-Q4_K_M.gguf", "--temperature", "0"]
+    # or use GPU
+    # command: ["--model_path", "/var/model/Llama-3.1-TAIDE-LX-8B-Chat-Q4_K_M.gguf", "--ngl", "-1", "--temperature", "0"]
+    restart: unless-stopped
+    volumes: ["/path/to/Llama-3.1-TAIDE-LX-8B-Chat-Q4_K_M.gguf:/var/model/Llama-3.1-TAIDE-LX-8B-Chat-Q4_K_M.gguf"] # Remember to change path
+    # Uncomment to use GPU
+    # deploy:
+    #   resources:
+    #     reservations:
+    #       devices:
+    #       - driver: nvidia
+    #         device_ids: ['0']
+    #         capabilities: [gpu]
+    networks: ["backend"]
+  ```
+4. 在 `genai-os/docker` 目錄下執行 `./run.sh` 即可啟動新模型
+# 如何量化
+關於量化Llama-3.1-TAIDE-LX-8B-Chat的方法與挑戰可以參考San-Li Hsu的筆記["使用llama.cpp將Hugging Face模型權重(safetensors)轉換成GGUF並進行量化"](https://hackmd.io/@San-Li/S1FEOWk9kg)。