unsubscribe commited on
Commit
9b7d181
·
verified ·
1 Parent(s): c3b4411

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +5 -5
README.md CHANGED
@@ -48,7 +48,7 @@ For instance, `internlm3-8b-instruct-fp16.gguf` can be downloaded as below:
48
 
49
  ```shell
50
  pip install huggingface-hub
51
- huggingface-cli download internlm/internlm3-8b-instruct-gguf internlm3-8b-instruct-fp16.gguf --local-dir . --local-dir-use-symlinks False
52
  ```
53
 
54
  ## Inference
@@ -59,7 +59,7 @@ You can use `llama-cli` for conducting inference. For a detailed explanation of
59
 
60
  ```shell
61
  build/bin/llama-cli \
62
- --model internlm3-8b-instruct-fp16.gguf  \
63
  --predict 512 \
64
  --ctx-size 4096 \
65
  --gpu-layers 48 \
@@ -84,7 +84,7 @@ build/bin/llama-cli \
84
 
85
  ```shell
86
  build/bin/llama-cli \
87
- --model internlm3-8b-instruct-fp16.gguf \
88
  --predict 512 \
89
  --ctx-size 4096 \
90
  --gpu-layers 48 \
@@ -123,10 +123,10 @@ The current temperature in Shanghai is 22 degrees Celsius.<|im_end|>
123
 
124
  ## Serving
125
 
126
- `llama.cpp` provides an OpenAI API compatible server - `llama-server`. You can deploy `internlm3-8b-instruct-fp16.gguf` into a service like this:
127
 
128
  ```shell
129
- ./build/bin/llama-server -m ./internlm3-8b-instruct-fp16.gguf -ngl 48
130
  ```
131
 
132
  At the client side, you can access the service through OpenAI API:
 
48
 
49
  ```shell
50
  pip install huggingface-hub
51
+ huggingface-cli download internlm/internlm3-8b-instruct-gguf internlm3-8b-instruct.gguf --local-dir . --local-dir-use-symlinks False
52
  ```
53
 
54
  ## Inference
 
59
 
60
  ```shell
61
  build/bin/llama-cli \
62
+ --model internlm3-8b-instruct.gguf  \
63
  --predict 512 \
64
  --ctx-size 4096 \
65
  --gpu-layers 48 \
 
84
 
85
  ```shell
86
  build/bin/llama-cli \
87
+ --model internlm3-8b-instruct.gguf \
88
  --predict 512 \
89
  --ctx-size 4096 \
90
  --gpu-layers 48 \
 
123
 
124
  ## Serving
125
 
126
+ `llama.cpp` provides an OpenAI API compatible server - `llama-server`. You can deploy `internlm3-8b-instruct.gguf` into a service like this:
127
 
128
  ```shell
129
+ ./build/bin/llama-server -m ./internlm3-8b-instruct.gguf -ngl 48
130
  ```
131
 
132
  At the client side, you can access the service through OpenAI API: