Benjamin Minixhofer's picture

Benjamin Minixhofer

benjamin

·

https://github.com/bminixhofer

AI & ML interests

NLP, Efficiency, Machine Learning in Rust, Multilinguality, Transfer Learning

Recent Activity

authored a paper 12 days ago

Bolmo: Byteifying the Next Generation of Language Models

submitted a paper 15 days ago

Bolmo: Byteifying the Next Generation of Language Models

updated a dataset 15 days ago

allenai/bolmo_mix

View all activity

Organizations

authored a paper 12 days ago

Bolmo: Byteifying the Next Generation of Language Models

Paper • 2512.15586 • Published 20 days ago • 14

submitted a paper to Daily Papers 15 days ago

Bolmo: Byteifying the Next Generation of Language Models

Paper • 2512.15586 • Published 20 days ago • 14

authored 2 papers 9 months ago

Retrofitting (Large) Language Models with Dynamic Tokenization

Paper • 2411.18553 • Published Nov 27, 2024 • 2

Cross-Tokenizer Distillation via Approximate Likelihood Matching

Paper • 2503.20083 • Published Mar 25, 2025 • 1

authored 5 papers over 1 year ago

Segment Any Text: A Universal Approach for Robust, Efficient and Adaptable Sentence Segmentation

Paper • 2406.16678 • Published Jun 24, 2024 • 16

Zero-Shot Tokenizer Transfer

Paper • 2405.07883 • Published May 13, 2024 • 5

Where's the Point? Self-Supervised Multilingual Punctuation-Agnostic Sentence Segmentation

Paper • 2305.18893 • Published May 30, 2023 • 2

CompoundPiece: Evaluating and Improving Decompounding Performance of Language Models

Paper • 2305.14214 • Published May 23, 2023

HumSet: Dataset of Multilingual Information Extraction and Classification for Humanitarian Crisis Response

Paper • 2210.04573 • Published Oct 10, 2022

authored a paper about 2 years ago

WECHSEL: Effective initialization of subword embeddings for cross-lingual transfer of monolingual language models

Paper • 2112.06598 • Published Dec 13, 2021 • 1