Model has trouble understanding who it's supposed to talk as.
Not sure if it's because of text completion instead of chat completion. If it wants to respond as user's persona, you can't convince it otherwise. It's like it gets confused and lost in the text.
Give it a shot in chat completion (and if it works, see if switching to text completion but mimicking the sampler settings keeps it fixed.) We suspect there may some fuckery with the GLM4 template.
To debug it - please ensure that you are using the exact template, as shown in the repo, or ST presets if you are using it, check if your backend incorrectly adds BOS (this model doesn't have it, [gMASK]<sop>
is not a BOS token config-wise) - some versions of llama.cpp and tabbyAPI add <|endoftext|>
as BOS which is incorrect. Also, if you are using KoboldCPP - please try using recent version of llama.cpp server instead.