Minish

non-profit

https://minish.ai/

minishlab

MinishLab

AI & ML interests

small models

Recent Activity

Pringled updated a Space 25 days ago

minishlab/README

Pringled new activity about 1 month ago

minishlab/semantic-deduplication:Solves 500 error for some users

burtenshaw updated a Space about 1 month ago

minishlab/semantic-deduplication

View all activity

Organization Card

Community About org cards

Hello, we're Minish!

We're a two-person (@pringled and @stephantul) open-source lab, with a focus on Natural Language Processing.

We believe that if you make models fast enough, you unlock new possibilities.

Using our software, you can:

Embed the entire English Wikipedia in 5 minutes
Classify tens of thousands of documents per second on a CPU
Approximately deduplicate extremely large datasets in minutes
Build the fastest RAG application in the world
Easily evaluate which ANN algorithm works best for your data

Our projects:

model2vec: tiny static embedding models with state-of-the-art performance.
potion: the best small models in the world. 100-500x faster than a sentence-transformer, and almost as good.
vicinity: consistent interfaces to many approximate nearest neighbor algorithms.
semhash: lightning-fast, super accuracte, semantic deduplication and filtering for your text datasets.
model2vec-rs: a Rust port of model2vec.

You can also find us on: 🔬 GitHub 👽 LinkedIn 💬 Discord

Collections 2

spaces 1

Semantic Deduplication

Deduplicate HuggingFace datasets in seconds

models 13

minishlab/potion-multilingual-128M

0.1B • Updated May 31 • 9.64k • 22

minishlab/potion-8m-edu-classifier

0.0B • Updated May 14 • 8 • 2

minishlab/potion-retrieval-32M

0.0B • Updated Jan 29 • 4.91k • 19

minishlab/potion-base-32M

0.0B • Updated Jan 29 • 11.3k • 16

minishlab/potion-science-8M

0.0B • Updated Jan 21 • 10 • 2

minishlab/potion-science-32M

0.0B • Updated Jan 21 • 190 • 2

minishlab/M2V_base_glove_subword

0.1B • Updated Jan 21 • 73 • 2

minishlab/M2V_base_glove

0.1B • Updated Jan 21 • 27 • 4

minishlab/M2V_base_output

0.0B • Updated Jan 21 • 231k • 10

minishlab/M2V_multilingual_output

0.1B • Updated Jan 21 • 2.52k • 19

datasets 2

minishlab/my-vicinity-repo

Viewer • Updated Mar 2 • 5 • 33 • 2

minishlab/tokenlearn_C4

Updated Oct 29, 2024 • 14 • 2