Wur doomed!
What do you and the others think of the distilled R1 models for writing?
The llama3 / qwen models SFT'd on R1 outputs? I only tried 2 of them.
R1 Qwen (32b) - Lacks knowledge of fiction (same as the official Qwen release), so it's writing is no better.
R1 Llama3 - This is generally the worst of them (not just for writing). It'll generate the CoT and then write something completely different.
CoT traces won't let the model do anything out of distribution, so not very useful if the base model doesn't have a lot in it's training data.
Yeah, I have tried the same two and felt the same way.
I also felt that any attempt to add an R1 distill to the merge recipe of an existing merge project made it worse...so far...
@gghfez @BigHuggyD that has been my experience as well, which is a shame as I had a go of R1 on Openrouter and I was blown away.
What model is anywhere close that is usable on a 24gb vram machine with 32gb of ram in your experience?
There's nothing like it for now. I'm running R1 slowly on my ThreadRipper:
prompt eval time = 14026.61 ms / 918 tokens ( 15.28 ms per token, 65.45 tokens per second)
eval time = 398806.12 ms / 1807 tokens ( 220.70 ms per token, 4.53 tokens per second)
total time = 412832.73 ms / 2725 tokens
I tried training Wizard2 8x22b MoE on R1 data, but it doesn't really work well. It will plan ahead in think tags eg:
I need to ensure the story maintains its gritty, realistic tone without becoming overly melodramatic. The characters' growth should be subtle but significant. Also, the ending should leave a sense of hope but not be too neat—their redemption is fragile, and the future is uncertain.
Let me outline the next few chapters:
Chapter 5: Nightmares and Trust
...
But it doesn't backtrack like R1 does. Just kind of agrees with it's self and ends up writing how it usually would:
“I don’t know what I want anymore,” she admitted, voice barely above a whisper as rain tapped against corrugated roofing overhead.
lol
Ahhh thats a shame :-(
"I don’t know what I want anymore,” she admitted, voice barely above a whisper as rain tapped against corrugated roofing overhead."
Oh god!
I'll have to keep an eye on this thread.
I did enjoy Ppoyaa/MythoNemo-L3.1-70B-v1.0
But my tastes are probably not as refined as others on this thread ;-)
Yeah, tested on open router and asked for my usual Grimdark fantasy promp, and by the end of the story one character was asking the other to "use the radio" to call their officer :D
lol...
I tested a couple of prompts and liked the prose (feels like they distilled Opus-4 for this one). I didn't check for continuity though and now it's getting hammered / times out.
It also likes to switch to Chinese like the older qwen models.
I had the smaller one do this a few times unless I put "Please respond in English only!" in the system prompt. Same thing happens with GLM-4 and GLM-Z. Qwen3-235b exl3 does it too sometimes.
I wonder why the Deepseek models don't have this issue (all the way back to their first releases years ago)
Yeah, tested on open router and asked for my usual Grimdark fantasy promp, and by the end of the story one character was asking the other to "use the radio" to call their officer :D
lol...
I tested a couple of prompts and liked the prose (feels like they distilled Opus-4 for this one). I didn't check for continuity though and now it's getting hammered / times out.
It also likes to switch to Chinese like the older qwen models.
I had the smaller one do this a few times unless I put "Please respond in English only!" in the system prompt. Same thing happens with GLM-4 and GLM-Z. Qwen3-235b exl3 does it too sometimes.
I wonder why the Deepseek models don't have this issue (all the way back to their first releases years ago)
I think it may just be a bad config for openrouter:
Yeah something seems off, it feels like Novita has something misconfigured. I'm getting abysmal prompt adherence, and also some Chinese replies to English queries.
(from their discord)
What are you alls opinion on the best coding assistant model? Size is no issue. I need something that is effective with really large context (>200k)
I have an old project that I hired someone on UpWork to do a couple years ago and I am trying to resurrect it. I had some... challenges with the person I hired due to English not being his primary language. DeepSeek has helped me work through some of the issues but I am wondering if there is something better for this purpose that is still community
For local models, especially with a niche codebase, definitely Command-A hands down if you can run it. I find it better than Deepseek for old projects. But try it first as i don't see it recommended much.
If you mean out of all models, Sonnet 4 or Opus 4 are the best, though i haven't tried Opus 4 at 200k context.
Gemini Pro 2.5 has worked well for me at over 200k but I find it's code style to be convoluted and harder to maintain.
For local models, especially with a niche codebase, definitely Command-A hands down if you can run it. I find it better than Deepseek for old projects. But try it first as i don't see it recommended much.
If you mean out of all models, Sonnet 4 or Opus 4 are the best, though i haven't tried Opus 4 at 200k context.
Gemini Pro 2.5 has worked well for me at over 200k but I find it's code style to be convoluted and harder to maintain.
Thanks! So the local stuff still can't hang with the private stuff. The original is written in C #
I personally haven't leveraged AI for programming much until recently. I must say that it is quite entertaining to forget to turn off my creativity prompts and presets . My assistant gets quite emotional and there is an immersive ambiance. :D
The original is written in C #
Same with the "old projects" I'm working with! This is where Comamnd-A really shines for me, as it's more familiar with the old libs and 4GL tooling associated with them, as well as being smart enough to pick up on people's quirky work arounds.
I must say that it is quite entertaining to forget to turn off my creativity prompts and presets
Yeah, I've had some funny responses what chatting about code with the wrong prompts. And with older models like Wizard, I was sending it dataset samples, and it would see some instructions within the dataset and start following them (suddenly started threatening me!)
If you're "chat-coding" / not hooking a coding tool up to the model (therefor prompt following, formatting / diff instructions don't matter), you could try qwen3, though it hasn't worked well for me.
For local models, especially with a niche codebase, definitely Command-A hands down if you can run it. I find it better than Deepseek for old projects. But try it first as i don't see it recommended much.
If you mean out of all models, Sonnet 4 or Opus 4 are the best, though i haven't tried Opus 4 at 200k context.
Gemini Pro 2.5 has worked well for me at over 200k but I find it's code style to be convoluted and harder to maintain.
Claude worked like a champ! Thanks!