ubergarm commited on
Commit
482e5b8
·
1 Parent(s): 1ccb47f

Support added into ik_llama.cpp main branch now yay!

Browse files
Files changed (1) hide show
  1. README.md +3 -8
README.md CHANGED
@@ -10,14 +10,12 @@ tags:
10
  - ik_llama.cpp
11
  ---
12
 
13
- *Note* The ik_llama.cpp PR is still in progress for support in main branch. Until then follow instructions here and keep an eye on the PR: https://github.com/ikawrakow/ik_llama.cpp/pull/668
14
-
15
  ## `ik_llama.cpp` imatrix Quantizations of zai-org/GLM-4.5-Air
16
  This quant collection **REQUIRES** [ik_llama.cpp](https://github.com/ikawrakow/ik_llama.cpp/) fork to support the ik's latest SOTA quants and optimizations! Do **not** download these big files and expect them to run on mainline vanilla llama.cpp, ollama, LM Studio, KoboldCpp, etc!
17
 
18
  *NOTE* `ik_llama.cpp` can also run your existing GGUFs from bartowski, unsloth, mradermacher, etc if you want to try it out before downloading my quants.
19
 
20
- Some of ik's new quants are supported with [Nexesenex/croco.cpp](https://github.com/Nexesenex/croco.cpp) fork of KoboldCPP.
21
 
22
  These quants provide best in class perplexity for the given memory footprint.
23
 
@@ -411,18 +409,15 @@ numactl -N 1 -m 1 \
411
  If you want to disable thinking, add `/nothink` (correct, no underscore) at the *end* of your prompt.
412
 
413
  ```bash
414
- # Clone and checkout experimental PR (hopefully merged into main soon)
415
  $ git clone https://github.com/ikawrakow/ik_llama.cpp
416
  $ cd ik_llama.cpp
417
- $ git remote add Thireus https://github.com/Thireus/ik_llama.cpp.git
418
- $ git fetch Thireus
419
- $ git checkout glm-4.5-clean
420
 
421
  # Build for hybrid CPU+CUDA
422
  $ cmake -B build -DCMAKE_BUILD_TYPE=Release -DGGML_CUDA=ON -DGGML_BLAS=OFF -DGGML_SCHED_MAX_COPIES=1
423
  $ cmake --build build --config Release -j $(nproc)
424
 
425
- # Test Experimental GGUF
426
  $ ./build/bin/llama-server \
427
  --model GLM-4.5-Air-IQ4_KSS-00001-of-00002.gguf \
428
  --alias ubergarm/GLM-4.5-Air-IQ4_KSS \
 
10
  - ik_llama.cpp
11
  ---
12
 
 
 
13
  ## `ik_llama.cpp` imatrix Quantizations of zai-org/GLM-4.5-Air
14
  This quant collection **REQUIRES** [ik_llama.cpp](https://github.com/ikawrakow/ik_llama.cpp/) fork to support the ik's latest SOTA quants and optimizations! Do **not** download these big files and expect them to run on mainline vanilla llama.cpp, ollama, LM Studio, KoboldCpp, etc!
15
 
16
  *NOTE* `ik_llama.cpp` can also run your existing GGUFs from bartowski, unsloth, mradermacher, etc if you want to try it out before downloading my quants.
17
 
18
+ Some of ik's new quants are supported with [Nexesenex/croco.cpp](https://github.com/Nexesenex/croco.cpp) fork of KoboldCPP with Windows builds for CUDA 12.9. Also check for [Windows builds by Thireus here.](https://github.com/Thireus/ik_llama.cpp/releases) which have been CUDA 12.8.
19
 
20
  These quants provide best in class perplexity for the given memory footprint.
21
 
 
409
  If you want to disable thinking, add `/nothink` (correct, no underscore) at the *end* of your prompt.
410
 
411
  ```bash
412
+ # Clone and checkout
413
  $ git clone https://github.com/ikawrakow/ik_llama.cpp
414
  $ cd ik_llama.cpp
 
 
 
415
 
416
  # Build for hybrid CPU+CUDA
417
  $ cmake -B build -DCMAKE_BUILD_TYPE=Release -DGGML_CUDA=ON -DGGML_BLAS=OFF -DGGML_SCHED_MAX_COPIES=1
418
  $ cmake --build build --config Release -j $(nproc)
419
 
420
+ # Run API server
421
  $ ./build/bin/llama-server \
422
  --model GLM-4.5-Air-IQ4_KSS-00001-of-00002.gguf \
423
  --alias ubergarm/GLM-4.5-Air-IQ4_KSS \