yargpt commited on
Commit
b10a9e3
·
1 Parent(s): 77bdf35

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +89 -0
README.md CHANGED
@@ -11,3 +11,92 @@ tags:
11
  <li>
12
  Original model: <a href="https://huggingface.co/lmsys/vicuna-13b-v1.5"> lmsys/vicuna-13b-v1.5</a>
13
  </li>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
11
  <li>
12
  Original model: <a href="https://huggingface.co/lmsys/vicuna-13b-v1.5"> lmsys/vicuna-13b-v1.5</a>
13
  </li>
14
+
15
+ <!-- README_GGUF.md-about-gguf start -->
16
+ ### About GGUF
17
+ GGUF is a new format introduced by the llama.cpp team on August 21st 2023. It is a replacement for GGML, which is no longer supported by llama.cpp.
18
+ Here is an incomplete list of clients and libraries that are known to support GGUF:
19
+ * [llama.cpp](https://github.com/ggerganov/llama.cpp). This is the source project for GGUF, providing both a Command Line Interface (CLI) and a server option.
20
+ * [text-generation-webui](https://github.com/oobabooga/text-generation-webui), Known as the most widely used web UI, this project boasts numerous features and powerful extensions, and supports GPU acceleration.
21
+ * [Ollama](https://github.com/jmorganca/ollama) Ollama is a lightweight and extensible framework designed for building and running language models locally. It features a simple API for creating, managing, and executing models, along with a library of pre-built models for use in various applications​
22
+ * [KoboldCpp](https://github.com/LostRuins/koboldcpp), A comprehensive web UI offering GPU acceleration across all platforms and architectures, particularly renowned for storytelling.
23
+ * [GPT4All](https://gpt4all.io), This is a free and open source GUI that runs locally, supporting Windows, Linux, and macOS with full GPU acceleration.
24
+ * [LM Studio](https://lmstudio.ai/) An intuitive and powerful local GUI for Windows and macOS (Silicon), featuring GPU acceleration.
25
+ * [LoLLMS Web UI](https://github.com/ParisNeo/lollms-webui). A notable web UI with a variety of unique features, including a comprehensive model library for easy model selection.
26
+ * [Faraday.dev](https://faraday.dev/), An attractive, user-friendly character-based chat GUI for Windows and macOS (both Silicon and Intel), also offering GPU acceleration.
27
+ * [llama-cpp-python](https://github.com/abetlen/llama-cpp-python), A Python library equipped with GPU acceleration, LangChain support, and an OpenAI-compatible API server.
28
+ * [candle](https://github.com/huggingface/candle), A Rust-based ML framework focusing on performance, including GPU support, and designed for ease of use.
29
+ * [ctransformers](https://github.com/marella/ctransformers), A Python library featuring GPU acceleration, LangChain support, and an OpenAI-compatible AI server.
30
+ * [localGPT](https://github.com/PromtEngineer/localGPT) An open-source initiative enabling private conversations with documents.
31
+ <!-- README_GGUF.md-about-gguf end -->
32
+ <!-- compatibility_gguf start -->
33
+ ## Explanation of quantisation methods
34
+ <details>
35
+ <summary>Click to see details</summary>
36
+ The new methods available are:
37
+
38
+ * GGML_TYPE_Q2_K - "type-1" 2-bit quantization in super-blocks containing 16 blocks, each block having 16 weight. Block scales and mins are quantized with 4 bits. This ends up effectively using 2.5625 bits per weight (bpw)
39
+ * GGML_TYPE_Q3_K - "type-0" 3-bit quantization in super-blocks containing 16 blocks, each block having 16 weights. Scales are quantized with 6 bits. This end up using 3.4375 bpw.
40
+ * GGML_TYPE_Q4_K - "type-1" 4-bit quantization in super-blocks containing 8 blocks, each block having 32 weights. Scales and mins are quantized with 6 bits. This ends up using 4.5 bpw.
41
+ * GGML_TYPE_Q5_K - "type-1" 5-bit quantization. Same super-block structure as GGML_TYPE_Q4_K resulting in 5.5 bpw
42
+ * GGML_TYPE_Q6_K - "type-0" 6-bit quantization. Super-blocks with 16 blocks, each block having 16 weights. Scales are quantized with 8 bits. This ends up using 6.5625 bpw.
43
+ </details>
44
+ <!-- compatibility_gguf end -->
45
+
46
+ <!-- README_GGUF.md-how-to-download start -->
47
+ ## How to download GGUF files
48
+
49
+ **Note for manual downloaders:** You almost never want to clone the entire repo! Multiple different quantisation formats are provided, and most users only want to pick and download a single file.
50
+
51
+ The following clients/libraries will automatically download models for you, providing a list of available models to choose from:
52
+
53
+ * LM Studio
54
+ * LoLLMS Web UI
55
+ * Faraday.dev
56
+
57
+ ### In `text-generation-webui`
58
+
59
+ Under Download Model, you can enter the model repo: yargpt/vicuna-13b-v1.5-gguf and below it, a specific filename to download, such as: yargpt/vicuna-13b-v1.5-gguf.
60
+
61
+ Then click Download.
62
+
63
+ ### On the command line, including multiple files at once
64
+
65
+ I recommend using the `huggingface-hub` Python library:
66
+
67
+ ```shell
68
+ pip3 install huggingface-hub
69
+ ```
70
+
71
+ Then you can download any individual model file to the current directory, at high speed, with a command like this:
72
+
73
+ ```shell
74
+ huggingface-cli download yargpt/vicuna-13b-v1.5-gguf yargpt/vicuna-13b-v1.5-gguf --local-dir . --local-dir-use-symlinks False
75
+ ```
76
+
77
+ <details>
78
+ <summary>More advanced huggingface-cli download usage (click to read)</summary>
79
+
80
+ You can also download multiple files at once with a pattern:
81
+
82
+ ```shell
83
+ huggingface-cli download yargpt/vicuna-13b-v1.5-gguf --local-dir . --local-dir-use-symlinks False --include='*Q4_K*gguf'
84
+ ```
85
+
86
+ For more documentation on downloading with `huggingface-cli`, please see: [HF -> Hub Python Library -> Download files -> Download from the CLI](https://huggingface.co/docs/huggingface_hub/guides/download#download-from-the-cli).
87
+
88
+ To accelerate downloads on fast connections (1Gbit/s or higher), install `hf_transfer`:
89
+
90
+ ```shell
91
+ pip3 install hf_transfer
92
+ ```
93
+
94
+ And set environment variable `HF_HUB_ENABLE_HF_TRANSFER` to `1`:
95
+
96
+ ```shell
97
+ HF_HUB_ENABLE_HF_TRANSFER=1 huggingface-cli download yargpt/vicuna-13b-v1.5-gguf yargpt/vicuna-13b-v1.5-gguf --local-dir . --local-dir-use-symlinks False
98
+ ```
99
+
100
+ Windows Command Line users: You can set the environment variable by running `set HF_HUB_ENABLE_HF_TRANSFER=1` before the download command.
101
+ </details>
102
+ <!-- README_GGUF.md-how-to-download end -->