Seems to enter in a loop when thinking

by Stilgar - opened 8 days ago

8 days ago

With lmstudio 0.3.15 build 11, when asking about coding question, "thinking" is never ending and propose same code again and again... no issue with Qwen3-8b.
I'm working with a RTX4090 and 96Go ram.

bartowski

LM Studio Community org 8 days ago

Can you give an example prompt?

sushistarlord

8 days ago

from QWEN, "For thinking mode, use Temperature=0.6, TopP=0.95, TopK=20, and MinP=0 (the default setting in generation_config.json). DO NOT use greedy decoding, as it can lead to performance degradation and endless repetitions. For more detailed guidance, please refer to the Best Practices section."

Stilgar

7 days ago

•

edited 6 days ago

I'm using some questions for models I want to evaluate in my field (C# and WPF)

The prompt :

instantiate a viewmodel object and share it with other viewmodels using Inversion of control C# and wpf with community.MVVM.Toolkit

I let the base parameters :

Temperature 0.8
Top K 40
repeat penality 1.1
Top P 0.95
Min P 0.05

I gave up after 44mn of thinking, QwQ is clearly faster (for both I'm using Q8 quantization)

By example I got this line of code twice, the same for other parts

protected override void OnStartup(StartupEventArgs e)
{
base.OnStartup(e);

var services = new ServiceCollection();

// Register the shared ViewModel as a singleton
services.AddSingleton<SharedViewModel>();

// Register other ViewModels as needed
services.AddTransient<MainViewModel>();
services.AddTransient<ChildViewModel>();

// Register the MainWindow and its ViewModel
services.AddSingleton<MainWindow>();
services.AddSingleton<MainViewModel>();

// Build the service provider
var serviceProvider = services.BuildServiceProvider();

// Set the IoC default container
CommunityToolkit.Mvvm.Ioc.Ioc.Default = serviceProvider;

// Get the MainWindow instance from the container and show it
var mainWindow = serviceProvider.GetRequiredService<MainWindow>();
mainWindow.Show();

}

Edit
@sushistarlord
Using your preset solved the issue, took 28mn and no more loop. But hallucinate some fonctions and is confusing with MvvmLight (another oldest library). This confusion was usual in the past when MvvmLight was not supported anymore (2 or 3 years ago) and community.MVVM.Toolkit replaced it. They seem to use an old dataset concerning C#, could be better with python. Look like below than qwen2.5 coder instruct from my point of view.

@bartowski
Thank you for your incredible work, 90% of my ggufs are from you :)

Gosho

4 days ago

•

edited 4 days ago

It is a common problem. When the thinking goes on, if you consume the context window the AI starts loosing the beginning of the task. That is how it gets trapped in the rolling window of the context.
Sometimes though it gets the right answer and dismisses it, and continues on and on...
So carefully examine the context window, if you run out of context window and the model does not get an answer you are likely to fail as it looses track.
If you run out of memory when you try to extend the context window you may try in advanced settings when you load to use K cash quantization and V cash quantization.

Also are you sure you load all the model in the GPU VRAM of 24 GB? How many tokens per second you get ? If you get below 30 per sec something is messed up.
Other solution was also proposed to fix it by using the prompting technique chain of draft, as qwen has quite the verbose reasonning.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Your need to confirm your account before you can post a new comment.

· Sign up or log in to comment