Update README.md
Browse files
README.md
CHANGED
@@ -62,18 +62,14 @@ In the following demonstration, we assume that you are running commands under th
|
|
62 |
You can run Qwen3 Embedding with one command:
|
63 |
|
64 |
```shell
|
65 |
-
./build/bin/llama-embedding -m model.gguf -p "<your context here
|
66 |
```
|
67 |
|
68 |
-
Or
|
69 |
```shell
|
70 |
./build/bin/llama-server -m model.gguf --embedding --pooling last -ub 8192 --verbose-prompt
|
71 |
```
|
72 |
|
73 |
-
📌 **Tip**: Qwen3 Embedding models default to using the last token as `<|endoftext|>`, so you need to manually append this token to the end of your own input context. In addition, when running the `llama-server`, you also need to manually normalize the output embeddings as `llama-server` currently does not support the `--embd-normalize` option.
|
74 |
-
|
75 |
-
|
76 |
-
|
77 |
|
78 |
## Evaluation
|
79 |
|
|
|
62 |
You can run Qwen3 Embedding with one command:
|
63 |
|
64 |
```shell
|
65 |
+
./build/bin/llama-embedding -m model.gguf -p "<your context here>" --pooling last --verbose-prompt
|
66 |
```
|
67 |
|
68 |
+
Or launch a server:
|
69 |
```shell
|
70 |
./build/bin/llama-server -m model.gguf --embedding --pooling last -ub 8192 --verbose-prompt
|
71 |
```
|
72 |
|
|
|
|
|
|
|
|
|
73 |
|
74 |
## Evaluation
|
75 |
|