Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
euclaise 
posted an update Feb 7
Post
Memphis: Advancing language model reasoning without relying on proprietary model outputs

Memphis is a series of models which advance human-data models, offering good performance without relying on proprietary model outputs (e.g. GPT-generated datasets). I've developed a new iterative finetuning procedure to improve the reasoning ability of these models beyond what is possible using only SFT on the same data.

Currently, I've released two models: Memphis-CoT-3B, and Memphis-scribe-3B.

To create these models, I've created new datasets:
- euclaise/reddit-instruct : A dataset of instruction/QA-like data scraped from Reddit. A curated version, filtered using Lilac and neural embedding models, is available at euclaise/reddit-instruct-curated
- euclaise/TinyCoT : TinyCoT is a mtea-dataset that aggregates a variety of different human-sourced reasoning data. It is a curated version of my previous MegaCoT dataset euclaise/MegaCoT, which contains 629k responses which get cut down to 28k for TinyCoT. There's also an intermediate version euclaise/MiniCoT, which has 129k responses.

Memphis-CoT is trained on reddit-instruct, a filtered version of oasst2 sablo/oasst2_curated, and TinyCoT. Multiple iterations were performed on TinyCoT, while reddit-instruct and oasst2 were only used for the initial model.

Memphis-scribe further finetunes Memphis-CoT on more creative tasks. It was finetuned from Memphis-CoT on 18 different datasets, including datasets like euclaise/WritingPrompts_curated, lemonilia/LimaRP, and more.

To prevent catastrophic forgetting, I used weight averaging between iterations.

- euclaise/Memphis-CoT-3B
- euclaise/Memphis-scribe-3B

Amazing work 🤩
I wish if we had a save button for posts here

I'm also interested to know more about this :
"To prevent catastrophic forgetting, I used weight averaging between iterations."
Can you please elaborate !? Tnx 🤗

·

"To prevent catastrophic forgetting, I used weight averaging between iterations."
Can you please elaborate !? Tnx 🤗

Language models tend to 'forget' information and skills when finetuned for too long. One way to prevent this is, instead of training the weights directly, train updated weights and then average the updated weights with the previous weights at each epoch (or, in this case, generation+finetuning cycle), and use the average instead of the raw update.