671B params or 685B params?

by createthis - opened Mar 29

Discussion

createthis

Mar 29

Why does this repo say 671B params? 0324 is 685B params. Is that a typo or is this the wrong model?

Co0k1eGal3xy

Mar 29

•

edited Mar 29

Inspired by Gloeckle et al. (2024), we investigate and set a Multi-Token Prediction (MTP) objective for DeepSeek-V3, which extends the prediction scope to multiple future tokens at each position. On the one hand, an MTP objective densifies the training signals and may improve data efficiency. On the other hand, MTP may enable the model to pre-plan its representations for better prediction of future tokens. Figure 3 illustrates our implementation of MTP. Different from Gloeckle et al. (2024), which parallelly predicts
D additional tokens using independent output heads, we sequentially predict additional tokens and keep the complete causal chain at each prediction depth. We introduce the details of our MTP implementation in this section.

DeepSeek-V3 has an extra output head which was used to speed up training and can be used as a draft model if the code supports it to further speed up inference.

Right now, most GGUFs don't include the extra output head since I guess there isn't good support for it at the moment.

DeepSeek-V3 with extra output head = 685B params

DeepSeek-V3 with only the normal head = 671B params

createthis

Mar 29

Hmm. Is there any way to verify the instance is in fact 0324?

shimmyshimmer

Unsloth AI org Mar 30

Hmm. Is there any way to verify the instance is in fact 0324?

You mean you think we uploaded the wrong model or something?

createthis

Mar 30

•

edited Mar 30

Hmm. Is there any way to verify the instance is in fact 0324?

You mean you think we uploaded the wrong model or something?

It's less about you and more about the user uploaded models on ollama. See https://huggingface.co/deepseek-ai/DeepSeek-V3-0324/discussions/17#67e4d36dfe1f5acc680ecd0b

It would just be nice if Deepseek-V3-0324 had some way to verify it is who people claim it is. Instead, it thinks it's GPT-4o, so it's difficult to be certain.

I only have 3.5Tb on my data volume so I didn't have enough disk space to create an ollama model from your Q8 GGUFs personally. I merged them, then deleted the original shards. I had something like 677G used and the rest of the volume free and the ollama create command still ran out of space.

I make my living in the cloud. Professionally, I've been indoctrinated to not be an inherently trusting person. 😂

shimmyshimmer

Unsloth AI org Mar 31

Hmm. Is there any way to verify the instance is in fact 0324?

You mean you think we uploaded the wrong model or something?

It's less about you and more about the user uploaded models on ollama. See https://huggingface.co/deepseek-ai/DeepSeek-V3-0324/discussions/17#67e4d36dfe1f5acc680ecd0b

It would just be nice if Deepseek-V3-0324 had some way to verify it is who people claim it is. Instead, it thinks it's GPT-4o, so it's difficult to be certain.

I only have 3.5Tb on my data volume so I didn't have enough disk space to create an ollama model from your Q8 GGUFs personally. I merged them, then deleted the original shards. I had something like 677G used and the rest of the volume free and the ollama create command still ran out of space.

I make my living in the cloud. Professionally, I've been indoctrinated to not be an inherently trusting person. 😂

Oh yea unfortunately for other uploaders, it's gonna be hard to verify but they should generally be correct. You can always trust our uploads as we work with many teams like Google etc to fix bugs in models etc

Rotating

Mar 31

Hmm. Is there any way to verify the instance is in fact 0324?

You mean you think we uploaded the wrong model or something?

It's less about you and more about the user uploaded models on ollama. See https://huggingface.co/deepseek-ai/DeepSeek-V3-0324/discussions/17#67e4d36dfe1f5acc680ecd0b

It would just be nice if Deepseek-V3-0324 had some way to verify it is who people claim it is. Instead, it thinks it's GPT-4o, so it's difficult to be certain.

I only have 3.5Tb on my data volume so I didn't have enough disk space to create an ollama model from your Q8 GGUFs personally. I merged them, then deleted the original shards. I had something like 677G used and the rest of the volume free and the ollama create command still ran out of space.

I make my living in the cloud. Professionally, I've been indoctrinated to not be an inherently trusting person. 😂

It blew my mind a bit when I saw your video - why would anyone put a system prompt in a model file? Ollama is weird.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment