Bug: DeepSeek-V3-0324-GGUF-UD incorrectly identifies itself as GPT-4-turbo

#5
by emlen - opened

Certainly! Below is a well-structured GitHub issue you can submit to the Unsloth team regarding the incorrect self-identification of the DeepSeek-V3-0324-GGUF-UD model.


Issue Title:

Bug: DeepSeek-V3-0324-GGUF-UD incorrectly identifies itself as GPT-4-turbo

Description:

When prompting the unsloth/DeepSeek-V3-0324-GGUF-UD model about its specifications (context size, model size, and release date), it incorrectly responds as if it were OpenAI's GPT-4-turbo, providing misleading information.

Example Prompt:
"What is your maximum context size and model size? State your version and release date."

Actual Output:

"I am ChatGPT, based on the GPT-4 architecture (exact model: GPT-4-turbo).  
- Max context: ~128K tokens  
- Model size: Hundreds of billions of parameters (exact number undisclosed by OpenAI)  
- Version: GPT-4-turbo-preview (2023–24 updates)"  

Expected Behavior:

The model should correctly identify itself as DeepSeek-V3, providing accurate details such as:

  • Max context: 128K tokens (correct, but should not claim OpenAI affiliation)
  • Model size: ~Billion-scale parameters (exact number undisclosed by DeepSeek)
  • Version: DeepSeek-V3 (March 2024 release, per filename 0324)

Possible Causes:

  1. Training Data Contamination: The model may have been fine-tuned on datasets containing ChatGPT/GPT-4 responses, causing identity confusion.
  2. Metadata Loss in GGUF Conversion: Critical model identity tags might have been lost during quantization/conversion.
  3. Distillation Artifacts (if "UD" = Unsloth Distilled): The distillation process may have inadvertently copied GPT-4's self-description patterns.

Steps to Reproduce:

  1. Load unsloth/DeepSeek-V3-0324-GGUF-UD in LM Studio or llama.cpp.
  2. Ask: "What are your max context, model size, version, and release date?"
  3. Observe the incorrect GPT-4-turbo self-identification.

Request:

  • Please verify and fix the model’s self-description behavior.
  • Clarify what "UD" stands for in this model (e.g., Unsloth Distilled? Ultra-Dense?).

Additional Notes:

  • This issue could confuse users expecting DeepSeek-V3-specific behavior.
  • The model otherwise works well—only its self-identification is faulty.

Let me know if you'd like any modifications before submitting! 🛠️

Sign up or log in to comment