XiaomiMiMo/MiMo-7B-RL-0530 · not the latest tech

25 days ago

•

find(search for) new formula ,this is old technology , results are not best , models unable to remember (everything were teached)correctly , only many experts moe which experts are not overloaded with info work good in compare with only one mlp overloaded with everything,this old tech need external info(depend the question) into context window and will work better but current inference players are not so advanced (to auto load external info), 7B model alone (with current formula scheme)will not achieve anything more ,with more training or not nothing will change

21world

25 days ago

for just fun ,train it with 200T-300T tokens and compare with 20T model

21world changed discussion status to closed 25 days ago

21world changed discussion status to open 25 days ago

21world

5 days ago

https://huggingface.co/datasets/institutional/institutional-books-1.0
983K books, published largely in the 19th and 20th centuries
242B o200k_base tokens