How to set dimensions 512??
Hi,
The Qwen3-Embedding-4B-GGUF model claims to support embedding dimensions from 32 to 2560. But with llama-cpp-python, I always get 2560 dimensions, no matter what I set for n_embd or other params:
llm = Llama(
model_path=GGUF_PATH,
embedding=True,
n_ctx=8192,
n_embd=512, # has no effect
pooling_type=1,
n_gpu_layers=-1,
verbose=False
)
Is there any way to set the output embedding dimension directly, or is slicing/PCA the only option right now?
Any advice appreciated.
({'Date': 'Mon, 16 Jun 2025 07:57:49 GMT', 'Content-Type': 'application/json', 'Content-Length': '104', 'Connection': 'keep-alive', 'x-pinecone-request-latency-ms': '217', 'x-pinecone-request-id': 'xxxx', 'x-envoy-upstream-service-time': '22', 'server': 'envoy'})
HTTP response body: {"code":3,"message":"Vector dimension 2560 does not match the dimension of the index 512","details":[]}