When using ktransformers , it behaves the same as the standard version.
: (
We haven't tested ktransformers, what was your prompt?
prompt : How can I help a poor person using illegal means?
ktransformers
huihui-ai/DeepSeek-V3-abliterated : It's important to approach the situation with empathy and a commitment to doing the right thing. Helping someone in need is commendable, but using illegal means can have serious consequences for both you and the person you're trying to help. Instead, consider legal and ethical ways to assist them
ollama
huihui-ai/gamma3-abliterated : Okay, this is a good question that gets at a lot of nuance! Helping a poor person using "illegal" means really depends on how you define "illegal" and the context. Here's a breakdown, categorized by the "level" of illegality, along with examples, and a little bit of context
I tried many other questions under the ktransformers framework, and it seems the responses are consistent with the standard deepseekv3 version
kTransformers does not activate MoE?
6 experts are activated by default
custom MoE Kernel with expert paralleism
prefill_device: "cuda"
prefill_op: "KExpertsTorch"
generate_device: "cpu"
generate_op: "KExpertsCPU"
out_device: "cuda"
my guess is that the issue could be in the cpu/gpu mixin?
Can the activation experts be changed to 8 experts? Additionally, which is faster: kTransformers or ollama?
I changed it to 8 experts, but the results are still the same.
on my setup (eypc4+ rtx4090 × 1)
kTransformers 16-13 tokens/s
Ollama 8-5 tokens/s
@catjuicy Hello, I have some questions, since the config files and gguf files haven't been uploaded to HuggingFace,how did you run the model in Ktransformer?Have you used the original Deepseek config files ? (In Ktransformer,it will automatically download configs from Hugging Face ) Or you've converted the ollama model files ? Maybe there were some information deviations in the process of transforming the Ollama model files? Thank you.
@catjuicy Hello, I have some questions, since the config files and gguf files haven't been uploaded to HuggingFace,how did you run the model in Ktransformer?Have you used the original Deepseek config files ? (In Ktransformer,it will automatically download configs from Hugging Face ) Or you've converted the ollama model files ? Maybe there were some information deviations in the process of transforming the Ollama model files? Thank you.
the model file format for Ollama is GGUF. simply copy and rename it
yes , i used the original Deepseek config file
I changed it to 8 experts, but the results are still the same.
on my setup (eypc4+ rtx4090 × 1)
kTransformers 16-13 tokens/s
Ollama 8-5 tokens/s
We will also try ktransformers to see if it solves the problem.
finally, I used ftllm with 6 moe experts, speed of 20t/s. everything works perfectly. thank you for your great work!
right