Akhil2507 commited on
Commit
3901701
·
verified ·
1 Parent(s): cd2171e

Update Readme.mf

Browse files

detailed description on how to convert hf models into gguf format

Files changed (1) hide show
  1. README.md +116 -3
README.md CHANGED
@@ -1,3 +1,116 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ library_name: transformers
4
+ ---
5
+
6
+ # GGUF Models: Conversion and Upload to Hugging Face
7
+
8
+ This guide explains what GGUF models are, how to convert models to GGUF format, and how to upload them to the Hugging Face Hub.
9
+
10
+ ## What is GGUF?
11
+
12
+ GGUF (GGML Unified Format) is a file format for storing large language models, particularly optimized for efficient inference on consumer hardware. Key features of GGUF models include:
13
+
14
+ - Successor to the GGML format
15
+ - Designed for efficient quantization and inference
16
+ - Supports a wide range of model architectures
17
+ - Commonly used with libraries like llama.cpp for running LLMs on consumer hardware
18
+ - Allows for reduced model size while maintaining good performance
19
+
20
+ ## Why and How to Convert to GGUF Format
21
+
22
+ Converting models to GGUF format offers several advantages:
23
+
24
+ 1. **Reduced file size**: GGUF models can be quantized to lower precision (e.g., int4, int8), significantly reducing model size.
25
+ 2. **Faster inference**: The format is optimized for quick loading and efficient inference on CPUs and consumer GPUs.
26
+ 3. **Cross-platform compatibility**: GGUF models can be used with libraries like llama.cpp, enabling deployment on various platforms.
27
+
28
+ To convert a model to GGUF format, we'll use the `convert-hf-to-gguf.py` script from the llama.cpp repository.
29
+
30
+ ### Steps to Convert a Model to GGUF
31
+
32
+ 1. **Clone the llama.cpp repository**:
33
+ ```bash
34
+ git clone https://github.com/ggerganov/llama.cpp.git
35
+ ```
36
+
37
+ 2. **Install required Python libraries**:
38
+ ```bash
39
+ pip install -r llama.cpp/requirements.txt
40
+ ```
41
+
42
+ 3. **Verify the script and understand options**:
43
+ ```bash
44
+ python llama.cpp/convert-hf-to-gguf-update.py -h
45
+ ```
46
+
47
+ 4. **Convert the HuggingFace model to GGUF**:
48
+ ```bash
49
+ python llama.cpp/convert-hf-to-gguf-update.py ./models/8B/Meta-Llama-3-8B-Instruct --outfile Llama3-8B-instruct-Q8.0.gguf --outtype q8_0
50
+ ```
51
+
52
+ This command converts the model to 8-bit quantization (q8_0). You can choose different quantization levels like int4, int8, or keep it in f16 or f32 format.
53
+
54
+
55
+ ## Uploading GGUF Models to Hugging Face
56
+
57
+ Once you have your GGUF model, you can upload it to Hugging Face for easy sharing and versioning.
58
+
59
+ ### Prerequisites
60
+
61
+ - Python 3.6+
62
+ - `huggingface_hub` library installed (`pip install huggingface_hub`)
63
+ - A Hugging Face account and API token
64
+
65
+ ### Upload Script
66
+
67
+ Save the following script as `upload_gguf_model.py`:
68
+
69
+ ```python
70
+ from huggingface_hub import HfApi
71
+
72
+ def push_to_hub(hf_token, local_path, model_id):
73
+ api = HfApi(token=hf_token)
74
+ api.create_repo(model_id, exist_ok=True, repo_type="model")
75
+
76
+ api.upload_file(
77
+ path_or_fileobj=local_path,
78
+ path_in_repo="Meta-Llama-3-8B-Instruct.bf16.gguf",
79
+ repo_id=model_id
80
+ )
81
+
82
+ print(f"Model successfully pushed to {model_id}")
83
+
84
+ # Example usage
85
+ hf_token = "your_huggingface_token_here"
86
+ local_path = "/path/to/your/local/model/directory"
87
+ model_id = "your-username/your-model-name"
88
+
89
+ push_to_hub(hf_token, local_path, model_id)
90
+ ```
91
+
92
+ ### Usage
93
+
94
+ 1. Replace the placeholder values in the script:
95
+ - `your_huggingface_token_here`: Your Hugging Face API token
96
+ - `/path/to/your/local/model/directory`: The local path to your GGUF model files
97
+ - `your-username/your-model-name`: Your desired model ID on Hugging Face
98
+
99
+ 2. Run the script:
100
+ ```bash
101
+ python upload_gguf_model.py
102
+ ```
103
+
104
+ ## Best Practices
105
+
106
+ - Include a `README.md` file with your model, detailing its architecture, quantization, and usage instructions.
107
+ - Add a `config.json` file with model configuration details.
108
+ - Include any necessary tokenizer files.
109
+
110
+ ## References
111
+
112
+ 1. [llama.cpp GitHub Repository](https://github.com/ggerganov/llama.cpp)
113
+ 2. [GGUF Format Discussion](https://github.com/ggerganov/llama.cpp/discussions/2948)
114
+ 3. [Hugging Face Documentation](https://huggingface.co/docs)
115
+
116
+ For more detailed information and updates, please refer to the official documentation of llama.cpp and Hugging Face.