40 5 43

Dominik Weckmüller

do-me

https://do-me.github.io/SemanticFinder/

AI & ML interests

Making AI more accessible. Working on semantic search, embeddings and Geospatial AI applications. https://geo.rocks

Recent Activity

liked a model 11 days ago

mlx-community/gemma-3-27b-it-qat-4bit

liked a dataset about 1 month ago

NUS-UAL/global-streetscapes

updated a dataset about 2 months ago

do-me/Geonames

View all activity

Organizations

Posts 7

Post

3256

New app built based on https://huggingface.co/docs/transformers.js and minishlab/potion-6721e0abd4ea41881417f062!
It uses the super performant CPU-only models to calculate semantic similarity fully client-side based on Excel or CSV tables.
- App: https://do-me.github.io/semantic-similarity-table/
- Code: https://github.com/do-me/semantic-similarity-table

Post

1171

What are your favorite text chunkers/splitters?
Mine are:
- https://github.com/benbrandt/text-splitter (Rust/Python, battle-tested, Wasm version coming soon)
- https://github.com/umarbutler/semchunk (Python, really performant but some issues with huge docs)

I tried the huge Jina AI regex, but it failed for my (admittedly messy) documents, e.g. from EUR-LEX. Their free segmenter API is really cool but unfortunately times out on my huge docs (~100 pages): https://jina.ai/segmenter/

Also, I tried to write a Vanilla JS chunker with a simple, adjustable hierarchical logic (inspired from the above). I think it does a decent job for the few lines of code: https://do-me.github.io/js-text-chunker/

Happy to hear your thoughts!

View all Posts