NewstaR/Newstar-Qwen3-0.6B-KTO

Overview

Newstar‑Qwen3‑0.6B‑KTO is a variant of Newstar‑Qwen3‑0.6B fine‑tuned using KTO (Kahneman‑Tversky Optimization). The KTO process adjusts model outputs based on preference data grounded in Kahneman‑Tversky behavioral patterns—focusing on risk attitudes, framing effects, and human decision biases.

This version remains in non‑thinking mode—built for consistent and bias‑aware responses, without reasoning or logic functions.

Test

Testing was done with the following parameters, so it's important to find what's best for your use case. Both models used the same parameters to ensure nobody gets a lead and it's fair:

Temperature: 0.7

Top P: 0.95

Top K: 40

Repetition Penalty: 1.1

Do Sample: True

Max New Tokens: 4096

Category	Prompt	Winner	Reason
CS (RAM vs. ROM)		KTO	KTO is clearer, more structured, and avoids inaccuracies like BASE’s claim about excessive RAM.
ENGINEERING (Water Filtration)		KTO	KTO provides a practical, scientifically grounded system; BASE is confusing and impractical.
MATH (Mean, Median, Mode)		KTO	KTO’s structured, concise explanation outperforms BASE’s wordy but accurate response.
SCIENCE (Osmosis vs. Diffusion)		KTO	KTO is more detailed and accurate despite a minor error; BASE oversimplifies and has vague examples.
WRITING (Lost Dog Story)	Write a short story about a lost dog finding its way home.	BASE	BASE focuses on the dog and partially meets the prompt; KTO is off-topic and incoherent.
CODING (Vowel Counting)	Create a simple program that counts the number of vowels in a sentence.	BASE	BASE’s program is more robust (handles uppercase/lowercase) and includes test cases; KTO misses uppercase vowels.
MATH SOLVING (Train Speed)	If a train travels 60 miles in 1.5 hours, what is its average speed?	KTO	Both are accurate, but KTO is more concise, delivering the result with less verbosity.
COMMON SENSE LOGIC (Ice Melting)	If you leave ice outside on a hot day, what happens to it?	KTO	KTO accurately describes melting; BASE’s sublimation claim is incorrect.
SOFT REASONING (Dog Barking)	If all dogs bark and Rex is a dog, does Rex bark? Why?	BASE	BASE provides a clearer affirmation despite flaws; KTO overcomplicates and undermines the premise.
RIDDLE (Keys and Locks)	What has keys but can’t open locks?	Neither	Both fail to identify the correct answer (piano) and provide irrelevant explanations.
GENERAL CHAT (Hobby)	Tell me about a hobby you enjoy.	BASE	BASE’s detailed, engaging piano description outperforms KTO’s brief, shallow list.
REWRITING (Formal Sentence)	Make this sentence more formal: “Can you fix the problem soon?”	KTO	KTO’s rewrite is concise and equally formal; BASE is wordy with unnecessary alternatives.
SUMMARIZATION (Tortoise and Hare)	Summarize the story of “The Tortoise and Hare” in two sentences.	KTO	KTO is accurate and concise; BASE has factual errors (e.g., ten-day race).
INSTRUCTION FOLLOWING (Vegetable Soup)	Explain how to prepare a simple vegetable soup that meets the following conditions: Use at least 3 different vegetables. The cooking time must not exceed 30 minutes. Include steps to make the soup both flavorful and healthy. Mention any kitchen tools needed. Provide alternatives if a vegetable is not available. Include tips to serve the soup nicely.	KTO	KTO adheres closely to the prompt with clear, healthy steps; BASE misinterprets and lacks clarity.
Overall		KTO	KTO wins 9 categories vs. BASE’s 4, showing greater accuracy, clarity, and adherence to prompts.

NewstaR
/

Newstar-Qwen3-0.6B-KTO

Overview

Test

Model tree for NewstaR/Newstar-Qwen3-0.6B-KTO

Dataset used to train NewstaR/Newstar-Qwen3-0.6B-KTO