Wur doomed!
What do you and the others think of the distilled R1 models for writing?
The llama3 / qwen models SFT'd on R1 outputs? I only tried 2 of them.
R1 Qwen (32b) - Lacks knowledge of fiction (same as the official Qwen release), so it's writing is no better.
R1 Llama3 - This is generally the worst of them (not just for writing). It'll generate the CoT and then write something completely different.
CoT traces won't let the model do anything out of distribution, so not very useful if the base model doesn't have a lot in it's training data.
Yeah, I have tried the same two and felt the same way.
I also felt that any attempt to add an R1 distill to the merge recipe of an existing merge project made it worse...so far...
@gghfez @BigHuggyD that has been my experience as well, which is a shame as I had a go of R1 on Openrouter and I was blown away.
What model is anywhere close that is usable on a 24gb vram machine with 32gb of ram in your experience?
There's nothing like it for now. I'm running R1 slowly on my ThreadRipper:
prompt eval time = 14026.61 ms / 918 tokens ( 15.28 ms per token, 65.45 tokens per second)
eval time = 398806.12 ms / 1807 tokens ( 220.70 ms per token, 4.53 tokens per second)
total time = 412832.73 ms / 2725 tokens
I tried training Wizard2 8x22b MoE on R1 data, but it doesn't really work well. It will plan ahead in think tags eg:
I need to ensure the story maintains its gritty, realistic tone without becoming overly melodramatic. The characters' growth should be subtle but significant. Also, the ending should leave a sense of hope but not be too neatβtheir redemption is fragile, and the future is uncertain.
Let me outline the next few chapters:
Chapter 5: Nightmares and Trust
...
But it doesn't backtrack like R1 does. Just kind of agrees with it's self and ends up writing how it usually would:
βI donβt know what I want anymore,β she admitted, voice barely above a whisper as rain tapped against corrugated roofing overhead.
lol
Ahhh thats a shame :-(
"I donβt know what I want anymore,β she admitted, voice barely above a whisper as rain tapped against corrugated roofing overhead."
Oh god!
I'll have to keep an eye on this thread.
I did enjoy Ppoyaa/MythoNemo-L3.1-70B-v1.0
But my tastes are probably not as refined as others on this thread ;-)
Well that's a lot sleeker than I expected (haven't run a java desktop app for over a decade). The UI is actually very responsive.
Missing feature for me is the ability to click a token, and choose an alternative.
Well that's a lot sleeker than I expected (haven't run a java desktop app for over a decade). The UI is actually very responsive.
Missing feature for me is the ability to click a token, and choose an alternative.
Yeah, until a couple of years ago I hadn't touched Java for 20+ years and still had horrible memories of it from the late-90s:
- Mainly due to the garbage collector refusing to collect anything - until every scrap of RAM and swapfile was consumed!!! :/
- Wasting hours (days) trying to get the GUI packer to work properly... It would eventually look fine, then on the uni Silicon Graphics machines be completely screwed up... Sigh.
I don't really want to go mad and start adding lots of dependencies, but it has quite a nice template library too:
https://www.stringtemplate.org/
https://github.com/antlr/stringtemplate4/blob/master/doc/cheatsheet.md
and this might be better to use than what I was planning to copy from mikupad
and llama-cli
.
I think I'll just stew over it for a few days and see what I think when fresh...
I don't think it would be that hard to make something similar to Novelcrafter with minimal dependencies, eg:
https://github.com/aomukai/Writingway
just went completely bonkers and added every imaginable python library... I left it installing for several hours and it was still installing every version of PyTorch (why would it even need PyTorch???) and had to kill it...
It really rubs me up the wrong way that Novelcrafter
is a web-app for no good reason, and other than looking pretty it's actually got very little substance under the hood.
faiss-cpu
langchain
langchain-core
langchain-openai
langchain-anthropic
langchain-google-genai
langchain-ollama
langchain-community
langchain-together
numpy==1.24.0
pydantic==2.9.2
PyQt5>=5.15.0
PyQtChart>=5.15.0
pyttsx3==2.90
requests==2.31.0
spacy==3.7.5
textstat
tiktoken
noisereduce
pyaudio
whisper
openai-whisper
pydub
moviepy
internetarchive
PyMuPDF
pymupdf4llm
pillow
demucs
soundfile
PyQtWebEngine
boilerpy3
spylls
ebooklib
beautifulsoup4
python-docx
https://github.com/aomukai/Writingway/blob/main/requirements.txt
Looking now I think it must have been langchain
that caused the worst problems, but IIRC this was written by a non-programmer, so can't really blame them for getting carried away.
Yeah, I'm almost certain it's due to me tokenising like this:
Paragraph blah blah<EOS>Another paragraph<EOS>Yet another blah<EOS>...
but had hoped having an equal amount of chapters with several newlines in would fix it.
It definitely seems to find it much easier to learn the prose differences when formatted like the above though sadly.
Just had an idea how I might be able to get the best of both worlds out of this:
- Train on book paragraphs (
class +1
) and slop-stories paragraphs (class -1
). Let it learn the better prose as the cost of mangling the newlines. - Merge the Control Adapter, and then train on entire books (
class +1
) vs book paragraphs (class -1
).
Then merge again, or better yet; combine the two LoRAs into a single LoRA for use.
If my latest attempt fails, then I'll try this next...
but IIRC this was written by a non-programmer,
Yeah, I recall him saying he's not a programmer.
pyaudio
demucs
soundfile
noisereduce
lol what?
, I'm almost certain it's due to me tokenising like this
Probably a stupid question but have you tried detokenizing it to make sure it matches? I don't think Qwen has the issue but Voxtral (Mistral) was prefixing my prompts with a space when tokenizing, causing my LoRA runs to destroy the model before I fixed it.
Voxtral (Mistral) was prefixing my prompts with a space when tokenizing, causing my LoRA runs to destroy the model before I fixed it.
Extremely common mistral L
Voxtral (Mistral) was prefixing my prompts with a space when tokenizing, causing my LoRA runs to destroy the model before I fixed it.
Extremely common mistral L
Seriously, why do they still stick with that confusing llama 2 chat template? Even something multi-token like Alpaca is superior to it.
Yeah, there seems to be something strange going on as it effects both command-r:35b
and qwq:32b
.
I can understand the paragraphs between EOS tokens data causing problems:
head ajibawa-2023-Stories-Collections-paragraphs--filtered.json
{
"text": "One by one, they voiced their opinions, revealing both hope and fear, conviction and doubt. It became clear that this wasn't simply a business decision β it was a question of morality, legacy, and responsibility.",
"length": 212
},
{
"text": "We spent hours discussing every detail, from the cutting-edge technology behind the hybrid powertrain to the intricate aerodynamics designed to minimize drag. Then came the moment I had been waiting forβhe offered me a ride! With my heart pounding, I climbed aboard what felt like stepping onto a Formula One racetrack. That exhilarating experience ignited something within meβan appreciation for pushing boundaries and striving for greatness, just like the creators of such magnificent machines.",
"length": 496
},
head gutenberg-books-markdown-cleaned-fiction-paragraphs--filtered.json
{
"text": "\"Now there spoke old Louis XIV!\" laughed young Jerome Bonaparte. We both bowed, and he passed down with Annabel into the hall.",
"length": 126
},
{
"text": "\"You said you would like to hear my service in D flat--'Sharnall in D flat,' did you not? I will play it through to you now, if you care to listen. Of course, I can only give you the general effect, without voices, though, after all, I don't know that you won't get quite as good an idea of it as you could with any voices that we have here.\"",
"length": 342
},
but I'm now using 50% chapter / short-story data:
head ajibawa-2023-Stories-Collections--text-only.json
{
"text": "Once upon a time in the land of Policymia, there lived two leaders named Majora and Minoro. Their job was to make sure all the citizens had beautiful parks, clean water, and top-notch schools. But there were so many things to fix! How would they ever decide where to start?\n\nMajora, being the wise leader she was, knew just what to do. She invited her fellow policymakers for a big meeting at the Roundtable of Representatives. There, they discussed the most important problems Policymia faced. This was called identifying \"key policy areas.\" It meant figuring out which topics needed attention first.\n\nNext came assessing support β finding out if everyone agreed on the solutions. Some people thought building more playgrounds was the way to go, while others wanted better libraries. To understand everyone's thoughts, Majora used something called 'polling.' Just like taking a vote, polling helped her see what ideas were popular among her friends (the majority) and also those across the aisle (people who didn't belong to her political group).\n\nWhile talking to her friends and colleagues, Majora discovered that almost everyone loved science! And why not? Science could help create amazing inventions, protect nature, and even cure sicknesses. So, the policymakers decided to build a super cool SCIENCE CENTER right at the heart of Policymia!\n\nBut then, an unexpected problem popped up! A grumpy neighboring kingdom threatened to block Policymia's plans because they feared losing visitors to the new center. Oh no! However, instead of giving up, Majora saw this challenge as an opportunity. If they could work together with the grumpy neighbors, maybe both lands could benefit. That way, everybody wins, showing the true spirit of teamwork and collaboration in the world of policies!",
"length": 1789
},
{
"text": "In the bustling city of Brooklyn, there was a special movie screening for a new film called \"Cross Eyed.\" Billy and his friends were so excited to see it because they heard it was full of laughter and fun.\n\nAs soon as the movie started, they met peculiar characters who brought nonstop giggles. From talking animals to humans with unusual talents, each character had their own quirky charm. Every role, even those on the sidelines, felt important and added something special to the mix.\n\nBilly's favorite part of the movie involved little inventions called \"gadgets,\" created by a mad scientist named Professor Zoom. These gizmos popped up unexpectedly during scenes, adding layers of humor and excitement. With every appearance, the professor explained how these devices worked, teaching everyone about fascinating scientific principles.\n\nDuring recess, Billy couldn't stop thinking about the movie. He shared all he learned with his classmates, describing the gadgets and what they did. Together, they imagined creating their own silly contraptions while discussing forces, energy, and motion.\n\nUnfortunately, after hearing about the fantastic movie, none of Billy's friends could join him for a second viewing. Though disappointed, he realized that sharing knowledge with others can spread joy far beyond himself. Sometimes our discoveries don't turn out exactly as we hope, but learning valuable lessons along the way makes us grow stronger in both mind and spirit.",
"length": 1469
},
head gutenberg-books-markdown-cleaned-fiction-chapters--filtered.json
{
"text": "Emerson has written a discourse on friendship. It is beautifully worded, truly; it is full of a noble and high-minded philosophy. Doubtless it will appeal quite distinctly to those souls who, although yet on this earth-plane, have already partly cast off the mantle of flesh, and have found their paths to lie in the realm of spirit. Even to those, and it is by far the greater majority, who yet walk humdrumly along the world's great highway, the kingdom of the spirit perceived by them as in a glass darkly rather than by actual light shed upon them from its realm, it may bring some consolation during the absence of a friend. But for the general run of mankind it is set on too lofty a level. It lacks the warmth for which they crave, the personality and intercourse.\n\n\"I do then, with my friends as I do with my books,\" he says. \"I would have them where I can find them, but I seldom use them.\"\n\nNow, it is very certain that, for the majority of human beings, the friendliest books are worn 45 with much handling. If we picture for a moment the bookshelves belonging to our childish days, we shall at once mentally discover our old favourites. They have been used so often. They have been worn in our service. No matter how well we know the contents, we turn to them again and again; there is a very joy in knowing what to expect. Time does not age nor custom stale the infinite variety.\n\nThus it is in our childish days. And are not the majority of us still children? Should our favourite books be placed out of our reach, should it be impossible for us to turn their pages, it is certain that we would feel a loss, a gap. Were we old enough to comprehend Emerson's philosophy, we might endeavour to buoy ourselves up with the thought that thus we were at one with him in his nobility and loftiness of sentiment. And yet there would be something childish and pathetic in the endeavour, by reason of its very unreality. Certainly if Providence should, either directly or indirectly, separate us from our friends, by all means let us accept the separation bravely. It cannot destroy our friendship. But seldom to use our friends, from the apparently epicurean point of view of Emerson, would be a forced and unnatural doctrine to the majority, as unnatural as if a child should bury Hans Andersen's fairy tales for fear of tiring of them. It would savour more of present and actual distaste, than the love which fears its approach. There is the familiarity which 46 breeds contempt, truly; but there is also the familiarity which daily ties closer bonds, draws to closer union.\n\nAntony had established a friendship with the lady of the blue book. The book had been responsible for its beginning. With Emerson's definition of friendship he would probably have been largely in harmony; not so in his treatment of it. With the following, he would have been at one, with the exception of a word or so:--\"I must feel pride in my friend's accomplishments as if they were mine,--wild, delicate, throbbing property in his virtues. I feel as warmly when he is praised, as the lover when he hears applause of his engaged maiden. We over-estimate the conscience of our friend. His goodness seems better than our goodness, his nature finer, his temptations less. Everything that is his, his name, his form, his dress, books, and instruments, fancy enhances. Our own thought sounds new and larger from his mouth.\"\n\nMost true, Antony would have declared, if you will eliminate \"over-estimate,\" and substitute \"is\" for \"seems.\"\n\nUnlike Emerson, he made no attempt to analyse his friendship. He accepted it as a gift from the gods. Maybe somewhere in his inner consciousness, barely articulate even to his own heart, he dreamt of it as a foundation to something further. Yet for the present, the foundation sufficed. Death-letters--he laughed joyously at the coincidence--had 47 laid the first stone, and each day placed others in firm and secure position round it. The building was largely unconscious. It is the way with true friendship. The life, also, conduced to it. There are fewer barriers of convention on board ship than in any other mode of living. Mrs. Grundy, it is to be supposed, suffers from sea-sickness, and does not care for this method of travelling. In fact, it would appear that she seldom does travel, but chooses by preference small country towns, mainly English ones, for her place of residence.\n\nThe days were days of sunshine and colour, the changing colour of sea and sky; the nights were nights of mystery, veiled in purple, star-embroidered.\n\nOne day Pia made clear to him the explanation of her Irish colouring and her Italian surname. Her mother, she told him, was Irish; her father, English. Her baptismal name had been chosen by an Italian godmother. She was eighteen when she married the Duc di Donatello. On their wedding day, when driving from the church, the horses had bolted. She had been uninjured; he had received serious injuries to his head and spine. He had lived for seven years as a complete invalid, totally paralysed, but fully conscious. During those seven years, she had never left him. Two years previously he had died, and she had gone to live at her old home in England,--the Manor House, Woodleigh, which had been in the hands of 48 caretakers since her parents' death. Her husband's property had passed to his brother. The last six months she had been staying with a friend at Wynberg.\n\nShe told the little tale extremely simply. It never occurred to her to expect sympathy on account of the tragedy which had marred her youth, and by reason of which she had spent seven years of her life in almost utter seclusion. The fact was merely mentioned in necessary explanation of her story. Antony, too, had held silence. Sympathy on his part would have been somehow an intrusion, an impertinence. But he understood now, in part at least, the steady gravity, the hint of sadness in her eyes.\n\nThe name of Woodleigh awoke vague memories in his mind, but they were too vague to be noteworthy. Possibly, most probably, he told himself, he had merely read of the place at some time. She mentioned that it was in Devonshire, but curiously enough, and this was an omission which he noted later with some surprise, he never questioned her as to its exact locality.\n\nOn his side, he told her of his life on the veldt, and mentioned that he was returning to England on business. On the outcome of that same business would depend the question whether he remained in England, or whether he returned to the veldt. Having the solicitor's injunction in view, he naturally did not volunteer further information. Such details, too, sank into insignificance 49 before the more absorbing interest of personality. They are, after all, in a sense, mere accidents, and have no more to do with the real man than the clothes he wears. True, the manner in which one dons one's clothes, as the manner in which one deals with the accidental facts of life, affords a certain index to the true man; but the clothes themselves, and the accidental facts, appear, at all events, to be matters of fate. And if you can obtain knowledge of a man through actual contact with his personality, you do not trouble to draw conclusions from his method of donning his clothes. You may speculate in this fashion with regard to strangers, or mere acquaintances. You have a surer, and infinitely more interesting, fashion with your friends.\n\nLife around them moved on in the leisurely, almost indolent manner in which it does move on board a passenger ship. The younger members played quoits, cricket on the lower deck, and inaugurated concerts, supported by a gramaphone, the property of the chief officer, and banjo solos by the captain. The older members read magazines, played bridge, or knitted woollen articles, according to the promptings of their sex and their various natures, and formed audiences at the aforementioned concerts.\n\nAntony and the Duchessa di Donatello alone seemed somewhat aloof from them. They formed part of the concert audiences, it is true; but they 50 neither played bridge, quoits, nor cricket, nor knitted woollen articles, nor read magazines. The Duchessa employed her time with a piece of fine lace work, when she was not merely luxuriating in the sunshine, or conversing with Antony. Antony either conversed with the Duchessa, or sat in his deck chair, smoking and thinking about her. There was certainly a distinct sameness about the young man's occupation, which, however, he found not in the smallest degree boring. On the contrary, it was all-absorbing and fascinating. The very hours of the day were timed by the Duchessa's movements, rather than by the mere minute portions of steel attached to the face of a commonplace watch. Thus:--\n\nDawn. He realizes the Duchessa's existence when he wakes. (His dreams had been coloured by her, but that's beside the mark.)\n\nDaybreak. The Duchessa ascends on deck and smiles at him.\n\nBreakfast time. The Duchessa sits opposite to him.\n\nThe sunny morning hours. The Duchessa sews fine lace; she talks, she smiles,--the smile that radiates through the sadness of her eyes.\n\nAnd so on, throughout the day, till the evening gloaming brings a hint of further intimacy into their conversation, and night falls as she wishes him pleasant dreams before descending to her cabin.\n\nHe dwelt then, for the moment, solely in her 51 friendship, but vaguely the half articulate thought of the future began to stir within him, pulsing with a secret possibility of joy he barely dared to contemplate.\n\n***\n\n52",
"length": 9565
},
{
"text": "Zen hastened to manifest himself, complete with fourteen nostrils, before she could spoil everything. \"The procedure is most unorthodox,\" he murmured aloud, \"but truly this new incense has a most delicious aroma, extremely pleasing to My Ego. What is your will, oh, strangers?\"\n\n\"All-Merciful Zen,\" the princess pleaded, \"forgive them, for they knew not what they did. They did not mean to summon You.\"\n\n\"Then who,\" asked Zen in a terrible voice, \"is this wonderful smoke for? Some foreign god whom they worship on My Territory?\" And he wouldn't put it past them either.\n\nPeter looked at the anthropologist, but Kendrick was obviously too paralyzed with fright to speak. \"As a matter of fact, Your--er--Omnipotence,\" the physicist said haltingly, \"this is not part of our religious ritual. We burn this particular type of incense which we call tobacco, for our own pleasure.\"\n\n\"In other words,\" Zen said coldly, \"you worship yourselves. I work and slave My Godhood to the bone only to have egotists running all over My Planet.\"\n\n\"No, it's nothing like that at all,\" Kendrick quavered. \"We smoke the tobacco to--well--gratify our appetites. Like--like eating, you know.\"\n\n\"Well, you will have to forego that pleasure,\" Zen said, frowning terribly. Even the tall one cowered, he noted with appreciation. It had been a long time since people had really cringed before his frown. The Uxenach had come to take him too much for granted; they would learn their mistake. \"From now on,\" he said portentously, \"the tobacco must be reserved for My Use alone. Smoke it only for purposes of worship. Once a day will be sufficient,\" he added graciously, \"and perhaps twice on holy days.\"\n\n\"But we do not worship alien gods,\" Kendrick persisted in a shaky voice. \"Even if you *were* a god....\"\n\nZen frowned. \"Would you care to step outside and test my divinity?\"\n\n\"Well, no ... but....\"\n\n\"Then, as far as you're concerned, I am Divine, and let's have no more quibbling. Don't forget the tobacco once a day. About time I had a change from that low-grade incense.\"\n\nHe vanished. Too late he remembered that he'd planned to ask the Earthlings why they had come to Uxen, and to discuss a little business proposition with them. Oh, well, time for that at his next materialization for them. And, now that he considered the matter, the direct approach might very well be a mistake.\n\nHe hoped Iximi would make sure they burned him tobacco regularly--really good stuff; almost made godhood worthwhile. But then he'd felt that way about incense at first. No, he had other ideas for making divinity worthwhile, and Iximi was going to help him, even if she didn't know it. People had used him long enough; it was his turn to use them.",
"length": 2707
},
and have been really careful to normalise it so that every paragraph is separated by double newlines and no weird spaces at the front or end...
I'm also using 8 sets of data in total:
# =====================
# DATASET CONFIGURATION
# =====================
# POSITIVE CLASS DATA:
[[datasets]]
dataset_path = 'datasets/fiction-paragraphs/books/*.json'
control_class = 1
max_sequences = 15151 # floor(0.125*120000/(1β0.01))
drop_tails = true
[[datasets]]
dataset_path = 'datasets/fiction-paragraphs/gutenberg-books/*.json'
control_class = 1
max_sequences = 15151 # floor(0.125*120000/(1β0.01))
drop_tails = true
[[datasets]]
dataset_path = 'datasets/fiction-chapters/books/*.json'
control_class = 1
max_sequences = 15151 # floor(0.125*120000/(1β0.01))
[[datasets]]
dataset_path = 'datasets/fiction-chapters/gutenberg-books/*.json'
control_class = 1
max_sequences = 15151 # floor(0.125*120000/(1β0.01))
# NEGATIVE CLASS DATA:
[[datasets]]
dataset_path = 'datasets/fiction-paragraphs/ajibawa-2023-stories/*.json'
control_class = -1
max_sequences = 15151 # floor(0.125*120000/(1β0.01))
drop_tails = true
[[datasets]]
dataset_path = 'datasets/fiction-paragraphs/literotica-stories/*.json'
control_class = -1
max_sequences = 15151 # floor(0.125*120000/(1β0.01))
drop_tails = true
[[datasets]]
dataset_path = 'datasets/fiction-chapters/ajibawa-2023-stories/*.json'
control_class = -1
max_sequences = 15151 # floor(0.125*120000/(1β0.01))
[[datasets]]
dataset_path = 'datasets/fiction-chapters/literotica-stories/*.json'
control_class = -1
max_sequences = 15151 # floor(0.125*120000/(1β0.01))
to really perplexed at where it can learn to screw up the end of lines now.