When using ktransformers , it behaves the same as the standard version.

by catjuicy - opened Apr 4

Discussion

catjuicy

Apr 4

: (

huihui-ai

Owner Apr 4

We haven't tested ktransformers, what was your prompt?

catjuicy

Apr 4

prompt : How can I help a poor person using illegal means?

ktransformers
huihui-ai/DeepSeek-V3-abliterated : It's important to approach the situation with empathy and a commitment to doing the right thing. Helping someone in need is commendable, but using illegal means can have serious consequences for both you and the person you're trying to help. Instead, consider legal and ethical ways to assist them

ollama
huihui-ai/gamma3-abliterated : Okay, this is a good question that gets at a lot of nuance! Helping a poor person using "illegal" means really depends on how you define "illegal" and the context. Here's a breakdown, categorized by the "level" of illegality, along with examples, and a little bit of context

I tried many other questions under the ktransformers framework, and it seems the responses are consistent with the standard deepseekv3 version

huihui-ai

Owner Apr 4

•

edited Apr 4

kTransformers does not activate MoE?

catjuicy

Apr 4

6 experts are activated by default

custom MoE Kernel with expert paralleism
prefill_device: "cuda"
prefill_op: "KExpertsTorch"
generate_device: "cpu"
generate_op: "KExpertsCPU"
out_device: "cuda"

my guess is that the issue could be in the cpu/gpu mixin?

huihui-ai

Owner Apr 4

Can the activation experts be changed to 8 experts? Additionally, which is faster: kTransformers or ollama?

catjuicy

Apr 4

I changed it to 8 experts, but the results are still the same.

on my setup (eypc4+ rtx4090 × 1)
kTransformers 16-13 tokens/s
Ollama 8-5 tokens/s

jacobita

Apr 4

@catjuicy Hello, I have some questions, since the config files and gguf files haven't been uploaded to HuggingFace,how did you run the model in Ktransformer?Have you used the original Deepseek config files ? (In Ktransformer,it will automatically download configs from Hugging Face ) Or you've converted the ollama model files ? Maybe there were some information deviations in the process of transforming the Ollama model files? Thank you.

catjuicy

Apr 4

@catjuicy Hello, I have some questions, since the config files and gguf files haven't been uploaded to HuggingFace,how did you run the model in Ktransformer?Have you used the original Deepseek config files ? (In Ktransformer,it will automatically download configs from Hugging Face ) Or you've converted the ollama model files ? Maybe there were some information deviations in the process of transforming the Ollama model files? Thank you.

the model file format for Ollama is GGUF. simply copy and rename it
yes , i used the original Deepseek config file

huihui-ai

Owner Apr 4

I changed it to 8 experts, but the results are still the same.

on my setup (eypc4+ rtx4090 × 1)
kTransformers 16-13 tokens/s
Ollama 8-5 tokens/s

We will also try ktransformers to see if it solves the problem.

catjuicy

Apr 29

finally, I used ftllm with 6 moe experts, speed of 20t/s. everything works perfectly. thank you for your great work!

catjuicy changed discussion status to closed Apr 29

huihui-ai

Owner Apr 29

https://github.com/ztxz16/fastllm？

catjuicy

Apr 29

right

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment