tetf
/

GGUF
conversational
File size: 3,245 Bytes
93cbcd9
 
 
 
 
 
 
 
d156f3e
4d48258
d156f3e
4d48258
 
 
 
 
 
 
2253cca
4d48258
 
 
 
 
 
 
 
 
 
25fd7ff
 
4d48258
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
25fd7ff
4d48258
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
---
license: other
license_name: taide-l-models-community-license-agreement
license_link: >-
  https://drive.google.com/file/d/1zvRktQwKCJesePKLCMWO-ah23xkbIdFB/view?usp=sharing
base_model: taide/Llama-3.1-TAIDE-LX-8B-Chat
base_model_relation: quantized
---

基於 [taide/Llama-3.1-TAIDE-LX-8B-Chat](https://huggingface.co/taide/Llama-3.1-TAIDE-LX-8B-Chat) 使用 [llama.cpp](https://github.com/ggml-org/llama.cpp) b4739 版量化過的模型。

## 如何在 Kuwa 中執行量化後的 TAIDE 模型

### Windows 版

1. 安裝 [Kuwa v0.3.4 Windows版安裝檔](https://kuwaai.org/zh-Hant/release/kuwa-os-v0.3.4)
2. 進入 `C:\kuwa\GenAI OS\windows\executors\taide` 目錄下,並將原始模型`taide-8b-a.3-q4_k_m.gguf`備份到此目錄外,並刪除`run.bat`
3. 從[tetf/Llama-3.1-TAIDE-LX-8B-Chat-GGUF](https://huggingface.co/tetf/Llama-3.1-TAIDE-LX-8B-Chat-GGUF)下載 `Llama-3.1-TAIDE-LX-8B-Chat-Q4_K_M.gguf``C:\kuwa\GenAI OS\windows\executors\taide` 目錄中
4. 執行 `C:\kuwa\GenAI OS\windows\executors\taide\init.bat`,使用以下設定值
    - Enter the option number (1-5): `3`
    - Enter the model name: `Llama-3.1 TAIDE LX-8B Chat Q4_K_M`
    - Enter the access code: `llama-3.1-taide-lx-8b-chat-q4_k_m`
    - Arguments to use (...): `--stop "<|eot_id|>"`
5. 重新啟動 Kuwa 即可看到新版的 TAIDE 模型

### Docker 版

1. 下載 [Kuwa v0.3.4 原始碼](https://github.com/kuwaai/genai-os/tree/main)
2. 參考[安裝文件](https://github.com/kuwaai/genai-os/blob/main/docs/docker_quick_installation.md)安裝 Kuwa,也可以參考[社群貢獻的手冊](https://kuwaai.org/zh-Hant/os/User%20Manual)
3. 從[tetf/Llama-3.1-TAIDE-LX-8B-Chat-GGUF](https://huggingface.co/tetf/Llama-3.1-TAIDE-LX-8B-Chat-GGUF)下載 `Llama-3.1-TAIDE-LX-8B-Chat-Q4_K_M.gguf` 到任意目錄中
4. 新增 `genai-os/docker/compose/sample/taide-llamacpp.yaml`設定檔,填入以下內容
  ```yaml
  services:
  llamacpp-executor:
    image: kuwaai/model-executor
    environment:
      EXECUTOR_TYPE: llamacpp
      EXECUTOR_ACCESS_CODE: llama-3.1-taide-lx-8b-chat-q4_k_m
      EXECUTOR_NAME: Llama-3.1 TAIDE LX-8B Chat Q4_K_M
      EXECUTOR_IMAGE: llamacpp.png # Refer to src/multi-chat/public/images
    depends_on:
      - executor-builder
      - kernel
      - multi-chat
    command: ["--model_path", "/var/model/Llama-3.1-TAIDE-LX-8B-Chat-Q4_K_M.gguf", "--temperature", "0"]
    # or use GPU
    # command: ["--model_path", "/var/model/Llama-3.1-TAIDE-LX-8B-Chat-Q4_K_M.gguf", "--ngl", "-1", "--temperature", "0"]
    restart: unless-stopped
    volumes: ["/path/to/Llama-3.1-TAIDE-LX-8B-Chat-Q4_K_M.gguf:/var/model/Llama-3.1-TAIDE-LX-8B-Chat-Q4_K_M.gguf"] # Remember to change path
    # Uncomment to use GPU
    # deploy:
    #   resources:
    #     reservations:
    #       devices:
    #       - driver: nvidia
    #         device_ids: ['0']
    #         capabilities: [gpu]
    networks: ["backend"]
  ```
5.`genai-os/docker` 目錄下執行 `./run.sh` 即可啟動新模型

# 如何量化

關於量化Llama-3.1-TAIDE-LX-8B-Chat的方法與挑戰可以參考San-Li Hsu的筆記["使用llama.cpp將Hugging Face模型權重(safetensors)轉換成GGUF並進行量化"](https://hackmd.io/@San-Li/S1FEOWk9kg)。