1B-IT Model | Prone to hallucinations?

#1
by AEsau - opened

Recently ran the 1B-IT model locally and verified with both the existing GGUF and another download that the inference is not working well:

image.png

image.png

Base parameters, no settings changes. If anything else is needed/desired, I can provide more details. Cheers!

I think I've figured out the main issue: runtime Vulkan llama.cpp v1.21.0.
I switched over to CPU only engine and it's working without issue, albeit inference is slower.
I'm going to attempt another run with a previous Vulkan runtime engine and see if there are any differences, will provide notes here.

Confirmed, it's Vulkan llama.cpp v1.21.0 that's the culprit. Older runtime engine like v1.20.x is working without issue, and CPU-only is also working just fine.

image.png

LM Studio Community org

Thanks for the report! I've escalated it :)

LM Studio Community org
edited Mar 19

@AEsau interesting. Thanks for reporting. Would you be able to share the copy-and-pasted results of right-clicking the gear icon next to the model so that we can see what config you're using when you see this issue?
image.png

Here you go @mattjcly :

appVersion: 0.3.13
appBuildVersion: "1"
modelPath: lmstudio-community/gemma-3-1b-it-GGUF/gemma-3-1b-it-Q4_K_M.gguf
prediction:
  layers:
    - layerName: hardware
      config:
        fields: []
    - layerName: modelDefault
      config:
        fields:
          - key: llm.prediction.promptTemplate
            value: <Default prompt template omitted for brevity>
          - key: llm.prediction.llama.cpuThreads
            value: 4
    - layerName: userModelDefault
      config:
        fields:
          - key: llm.prediction.promptTemplate
            value:
              type: jinja
              jinjaPromptTemplate:
                template: >
                  {{ bos_token }}

                  {%- if messages[0]['role'] == 'system' -%}
                      {%- if messages[0]['content'] is string -%}
                          {%- set first_user_prefix = messages[0]['content'] + '

                  ' -%}
                      {%- else -%}
                          {%- set first_user_prefix = messages[0]['content'][0]['text'] + '

                  ' -%}
                      {%- endif -%}
                      {%- set loop_messages = messages[1:] -%}
                  {%- else -%}
                      {%- set first_user_prefix = "" -%}
                      {%- set loop_messages = messages -%}
                  {%- endif -%}

                  {%- for message in loop_messages -%}
                      {%- if (message['role'] == 'user') != (loop.index0 % 2 == 0) -%}
                          {{ raise_exception("Conversation roles must alternate user/assistant/user/assistant/...") }}
                      {%- endif -%}
                      {%- if (message['role'] == 'assistant') -%}
                          {%- set role = "model" -%}
                      {%- else -%}
                          {%- set role = message['role'] -%}
                      {%- endif -%}
                      {{ '<start_of_turn>' + role + '
                  ' + (first_user_prefix if loop.first else "") }}
                      {%- if message['content'] is string -%}
                          {{ message['content'] | trim }}
                      {%- elif message['content'] is iterable -%}
                          {%- for item in message['content'] -%}
                              {%- if item['type'] == 'image' -%}
                                  {{ '<start_of_image>' }}
                              {%- elif item['type'] == 'text' -%}
                                  {{ item['text'] | trim }}
                              {%- endif -%}
                          {%- endfor -%}
                      {%- else -%}
                          {{ raise_exception("Invalid content type") }}
                      {%- endif -%}
                      {{ '<end_of_turn>
                  ' }}

                  {%- endfor -%}

                  {%- if add_generation_prompt -%}
                      {{'<start_of_turn>model
                  '}}

                  {%- endif -%}
                bosToken: <bos>
                eosToken: <eos>
                inputConfig:
                  messagesConfig:
                    contentConfig:
                      type: array
                      textFieldName: text
                  useTools: false
              stopStrings: []
          - key: llm.prediction.maxPredictedTokens
            value:
              checked: false
              value: 2048
    - layerName: instance
      config:
        fields: []
    - layerName: conversationSpecific
      config:
        fields:
          - key: llm.prediction.llama.cpuThreads
            value: 6
          - key: llm.prediction.temperature
            value: 0.5
load:
  layers:
    - layerName: currentlyLoaded
      config:
        fields:
          - key: llm.load.llama.cpuThreadPoolSize
            value: 6
          - key: llm.load.contextLength
            value: 4096
          - key: llm.load.llama.acceleration.offloadRatio
            value: 1
    - layerName: instance
      config:
        fields: []
hardware:
  gpuSurveyResult:
    result:
      code: Success
      message: ""
    gpuInfo:
      - name: Radeon RX 5500 XT
        deviceId: 0
        totalMemoryCapacityBytes: 42893049856
        dedicatedMemoryCapacityBytes: 8573157376
        integrationType: Discrete
        detectionPlatform: Vulkan
        detectionPlatformVersion: 1.3.283
        otherInfo: {}
  cpuSurveyResult:
    result:
      code: Success
      message: ""
    cpuInfo:
      name: ""
      architecture: x86_64
      supportedInstructionSetExtensions:
        - AVX
        - AVX2
selectedRuntimes:
  - modelCompatibilityType: gguf
    runtime:
      name: llama.cpp-win-x86_64-vulkan-avx2
      version: 1.21.0

Sign up or log in to comment