1B-IT Model | Prone to hallucinations?
#1
by
AEsau
- opened
I think I've figured out the main issue: runtime Vulkan llama.cpp v1.21.0.
I switched over to CPU only engine and it's working without issue, albeit inference is slower.
I'm going to attempt another run with a previous Vulkan runtime engine and see if there are any differences, will provide notes here.
Thanks for the report! I've escalated it :)
@AEsau
interesting. Thanks for reporting. Would you be able to share the copy-and-pasted results of right-clicking the gear icon next to the model so that we can see what config you're using when you see this issue?
Here you go @mattjcly :
appVersion: 0.3.13
appBuildVersion: "1"
modelPath: lmstudio-community/gemma-3-1b-it-GGUF/gemma-3-1b-it-Q4_K_M.gguf
prediction:
layers:
- layerName: hardware
config:
fields: []
- layerName: modelDefault
config:
fields:
- key: llm.prediction.promptTemplate
value: <Default prompt template omitted for brevity>
- key: llm.prediction.llama.cpuThreads
value: 4
- layerName: userModelDefault
config:
fields:
- key: llm.prediction.promptTemplate
value:
type: jinja
jinjaPromptTemplate:
template: >
{{ bos_token }}
{%- if messages[0]['role'] == 'system' -%}
{%- if messages[0]['content'] is string -%}
{%- set first_user_prefix = messages[0]['content'] + '
' -%}
{%- else -%}
{%- set first_user_prefix = messages[0]['content'][0]['text'] + '
' -%}
{%- endif -%}
{%- set loop_messages = messages[1:] -%}
{%- else -%}
{%- set first_user_prefix = "" -%}
{%- set loop_messages = messages -%}
{%- endif -%}
{%- for message in loop_messages -%}
{%- if (message['role'] == 'user') != (loop.index0 % 2 == 0) -%}
{{ raise_exception("Conversation roles must alternate user/assistant/user/assistant/...") }}
{%- endif -%}
{%- if (message['role'] == 'assistant') -%}
{%- set role = "model" -%}
{%- else -%}
{%- set role = message['role'] -%}
{%- endif -%}
{{ '<start_of_turn>' + role + '
' + (first_user_prefix if loop.first else "") }}
{%- if message['content'] is string -%}
{{ message['content'] | trim }}
{%- elif message['content'] is iterable -%}
{%- for item in message['content'] -%}
{%- if item['type'] == 'image' -%}
{{ '<start_of_image>' }}
{%- elif item['type'] == 'text' -%}
{{ item['text'] | trim }}
{%- endif -%}
{%- endfor -%}
{%- else -%}
{{ raise_exception("Invalid content type") }}
{%- endif -%}
{{ '<end_of_turn>
' }}
{%- endfor -%}
{%- if add_generation_prompt -%}
{{'<start_of_turn>model
'}}
{%- endif -%}
bosToken: <bos>
eosToken: <eos>
inputConfig:
messagesConfig:
contentConfig:
type: array
textFieldName: text
useTools: false
stopStrings: []
- key: llm.prediction.maxPredictedTokens
value:
checked: false
value: 2048
- layerName: instance
config:
fields: []
- layerName: conversationSpecific
config:
fields:
- key: llm.prediction.llama.cpuThreads
value: 6
- key: llm.prediction.temperature
value: 0.5
load:
layers:
- layerName: currentlyLoaded
config:
fields:
- key: llm.load.llama.cpuThreadPoolSize
value: 6
- key: llm.load.contextLength
value: 4096
- key: llm.load.llama.acceleration.offloadRatio
value: 1
- layerName: instance
config:
fields: []
hardware:
gpuSurveyResult:
result:
code: Success
message: ""
gpuInfo:
- name: Radeon RX 5500 XT
deviceId: 0
totalMemoryCapacityBytes: 42893049856
dedicatedMemoryCapacityBytes: 8573157376
integrationType: Discrete
detectionPlatform: Vulkan
detectionPlatformVersion: 1.3.283
otherInfo: {}
cpuSurveyResult:
result:
code: Success
message: ""
cpuInfo:
name: ""
architecture: x86_64
supportedInstructionSetExtensions:
- AVX
- AVX2
selectedRuntimes:
- modelCompatibilityType: gguf
runtime:
name: llama.cpp-win-x86_64-vulkan-avx2
version: 1.21.0