Some feedback

#2
by rkfg - opened

Honestly, I'm not sure if it's even in the project's scope but still. I compared the previous version 1.1 (not sure if it's made by the same people) with v2 and in my experience v2 is worse, more bland and less creative. However, I only tested these models in Russian (that's why I'm not sure it's important for the project but it kinda is for me). The test was quite unusual, I wanted to see how creative the model can be by asking it to write a news article about complete nonsense. There's a Twitter/Telegram bot named "Neural Meduza" which regularly posts AI-generated fake news titles, it was trained on Russian satirical news titles and such, they're usually funny by themselves but there's nothing else, just the titles. Some examples (translated): "A Voronezh citizen tried to pay his utility bills with ravioli", "Astronomers discovered a planet inhabited exclusively by clowns", "UN urged to ban the words 'yes' and 'no'", "MIA proposed to ban polygonal shapes" and so on.

So I made a simple bot in Silly Tavern that writes whatever the user asks it to. v1.1 produces very varied stories, news articles, comments about them and tries to rationalize and explain whatever insanity I throw at it. It also has a sort of a "world model" as it clearly understands, for example, how a non-animate object should participate in a scene that usually only has humans involved. In my case it was a cow-parsnip being awarded by the President, and it "cast its leaves down" (not eyes!) and so on, plenty of cases where a novel situation that I don't believe was even remotely present in any dataset, is described correctly and has small smart details. This was very refreshing and surprising! Especially since it was in my first language and not English, which of course always gets a lot more attention.

Now, while v2 can also do that it happens not as often and I need to expand my initial prompt to get anything interesting at all. I noticed that it uses a different prompt format too (ChatML for v2 and Mistral for v1.1), I translated the system prompt as well (the actual instruction parts, not the tags of course). I'm yet to try English because it's too much fun reading these crazy articles. And I understand that training quite often changes the model's balance, it becomes better at something and worse at something else. If supporting other languages isn't a priority it's fine!

I also tried the official Nemo Instruct and it was the worst as expected, barely one paragraph of uninspired writing.

I'm extremely grateful that such a good model is now available openly, there was just one Russian model from Sberbank and it was awfully trained, randomly switching to writing news articles no matter what it did before. And while LLaMA and co can write in other languages, they never feel native. Rather a poor word for word translation from English, no culture, idioms, metaphors, proverbs knowledge etc. Nemo is the first model that's a pleasure to read and Magnum is the first fine tune that surprised me and made me laugh until I cried.

Anthracite org

thank you for such elaborate feedback and taking the time out of your day! we did not have russian data in the training process for any of the models mentioned, and while nemo did sort of do alright in russian in my own testing before (the raw model, untrained), we didn't specifically target this; for russian generally qwen based models performed better, but qwen did not release a similar sized checkpoint in the 12b range, only 7b and upscaled 32b and by stepping down all the way to 7b even if we trained that; it would be a lot worse than 12b.

Further training multiple languages risks it suddenly switching language mid-context or dropping words; the softmax bottleneck that v1 qwen suffered from where it randomly replied in chinese tokens, and since our main audience is english speaking, that would significantly damage the models performance.

An interim solution could possibly be to use the translation features of sillytavern; this way you get the english performance with the translation both ways.

While I also can't promise you we will focus on other languages but english in the forseeable future; so any russian function would probably be a happy side-effect; I am working little by little on my own russian RP model, I'm just tied for time currently, but it's on my timeline for the future, so maybe it'll help your cause once it comes out. (and maybe by that time there'll be better models to base it on too!)

thanks again for such awesome feedback!

lucyknada changed discussion status to closed

Sign up or log in to comment