Support added into ik_llama.cpp main branch now yay!
Browse files
README.md
CHANGED
@@ -10,14 +10,12 @@ tags:
|
|
10 |
- ik_llama.cpp
|
11 |
---
|
12 |
|
13 |
-
*Note* The ik_llama.cpp PR is still in progress for support in main branch. Until then follow instructions here and keep an eye on the PR: https://github.com/ikawrakow/ik_llama.cpp/pull/668
|
14 |
-
|
15 |
## `ik_llama.cpp` imatrix Quantizations of zai-org/GLM-4.5-Air
|
16 |
This quant collection **REQUIRES** [ik_llama.cpp](https://github.com/ikawrakow/ik_llama.cpp/) fork to support the ik's latest SOTA quants and optimizations! Do **not** download these big files and expect them to run on mainline vanilla llama.cpp, ollama, LM Studio, KoboldCpp, etc!
|
17 |
|
18 |
*NOTE* `ik_llama.cpp` can also run your existing GGUFs from bartowski, unsloth, mradermacher, etc if you want to try it out before downloading my quants.
|
19 |
|
20 |
-
Some of ik's new quants are supported with [Nexesenex/croco.cpp](https://github.com/Nexesenex/croco.cpp) fork of KoboldCPP.
|
21 |
|
22 |
These quants provide best in class perplexity for the given memory footprint.
|
23 |
|
@@ -411,18 +409,15 @@ numactl -N 1 -m 1 \
|
|
411 |
If you want to disable thinking, add `/nothink` (correct, no underscore) at the *end* of your prompt.
|
412 |
|
413 |
```bash
|
414 |
-
# Clone and checkout
|
415 |
$ git clone https://github.com/ikawrakow/ik_llama.cpp
|
416 |
$ cd ik_llama.cpp
|
417 |
-
$ git remote add Thireus https://github.com/Thireus/ik_llama.cpp.git
|
418 |
-
$ git fetch Thireus
|
419 |
-
$ git checkout glm-4.5-clean
|
420 |
|
421 |
# Build for hybrid CPU+CUDA
|
422 |
$ cmake -B build -DCMAKE_BUILD_TYPE=Release -DGGML_CUDA=ON -DGGML_BLAS=OFF -DGGML_SCHED_MAX_COPIES=1
|
423 |
$ cmake --build build --config Release -j $(nproc)
|
424 |
|
425 |
-
#
|
426 |
$ ./build/bin/llama-server \
|
427 |
--model GLM-4.5-Air-IQ4_KSS-00001-of-00002.gguf \
|
428 |
--alias ubergarm/GLM-4.5-Air-IQ4_KSS \
|
|
|
10 |
- ik_llama.cpp
|
11 |
---
|
12 |
|
|
|
|
|
13 |
## `ik_llama.cpp` imatrix Quantizations of zai-org/GLM-4.5-Air
|
14 |
This quant collection **REQUIRES** [ik_llama.cpp](https://github.com/ikawrakow/ik_llama.cpp/) fork to support the ik's latest SOTA quants and optimizations! Do **not** download these big files and expect them to run on mainline vanilla llama.cpp, ollama, LM Studio, KoboldCpp, etc!
|
15 |
|
16 |
*NOTE* `ik_llama.cpp` can also run your existing GGUFs from bartowski, unsloth, mradermacher, etc if you want to try it out before downloading my quants.
|
17 |
|
18 |
+
Some of ik's new quants are supported with [Nexesenex/croco.cpp](https://github.com/Nexesenex/croco.cpp) fork of KoboldCPP with Windows builds for CUDA 12.9. Also check for [Windows builds by Thireus here.](https://github.com/Thireus/ik_llama.cpp/releases) which have been CUDA 12.8.
|
19 |
|
20 |
These quants provide best in class perplexity for the given memory footprint.
|
21 |
|
|
|
409 |
If you want to disable thinking, add `/nothink` (correct, no underscore) at the *end* of your prompt.
|
410 |
|
411 |
```bash
|
412 |
+
# Clone and checkout
|
413 |
$ git clone https://github.com/ikawrakow/ik_llama.cpp
|
414 |
$ cd ik_llama.cpp
|
|
|
|
|
|
|
415 |
|
416 |
# Build for hybrid CPU+CUDA
|
417 |
$ cmake -B build -DCMAKE_BUILD_TYPE=Release -DGGML_CUDA=ON -DGGML_BLAS=OFF -DGGML_SCHED_MAX_COPIES=1
|
418 |
$ cmake --build build --config Release -j $(nproc)
|
419 |
|
420 |
+
# Run API server
|
421 |
$ ./build/bin/llama-server \
|
422 |
--model GLM-4.5-Air-IQ4_KSS-00001-of-00002.gguf \
|
423 |
--alias ubergarm/GLM-4.5-Air-IQ4_KSS \
|