RexBERT-mini
An efficient, English encoder-only model (masked-language model) with ~8k token context, targeted at e-commerce and retail NLP.
Model summary
- Model type:
ModernBertForMaskedLM
(encoder-only, masked-language modeling head) - Domain: e-commerce/retail/shopping
- Language: English
- Context length: 7,999โ8,192 tokens (config max_position_embeddings=7999; ModernBERT supports up to 8192)
- License: Apache-2.0
Intended uses & limitations
Direct use
- Fill-mask and cloze completion (e.g., product titles, attributes, query reformulation).
- Embeddings / feature extraction for classification, clustering, retrieval re-ranking, and semantic search in retail catalogs and queries (via pooled encoder states). (ModernBERT is a drop-in BERT-style encoder.)
Downstream use
- Fine-tune for product categorization, attribute extraction, NER, intent classification, and retrieval-augmented ranking tasks in commerce search & browse. (Use a task head or pooled embeddings.)
Out-of-scope / not recommended
- Autoregressive text generation or chat; this is not a decoder LLM. Use decoder or seq2seq models for long-form generation.
How to get started
from transformers import AutoTokenizer, AutoModelForMaskedLM
model_id = "thebajajra/RexBERT-mini"
tok = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForMaskedLM.from_pretrained(model_id)
text = "The customer purchased a [MASK] with free shipping."
inputs = tok(text, return_tensors="pt")
logits = model(**inputs).logits # use top-k on tok.mask_token_id
Model details
Architecture (from config)
- Backbone: ModernBERT (
model_type: "modernbert"
,architectures: ["ModernBertForMaskedLM"]
) - Layers / heads / width: 19 encoder layers, 8 attention heads, hidden size 512; intermediate (MLP) size 768; GELU activations.
- Attention: Local window 128 with global attention every 3 layers; RoPE ฮธ=160k (local & global).
- Positional strategy:
position_embedding_type: "sans_pos"
.
Training data & procedure
Performance Highlights
MLM โ Token Classification (E-Commerce)
RexBERT-mini outperforms DistilBERT on all token classification tasks
Product Title
Product Description
Technical notes for practitioners
- Pooling: Use mean pooling over last hidden states (the configโs classifier pooling is
"mean"
), or task-specific pooling. - Long sequences: Leverage the extended context for product pages, multi-turn queries, or concatenated fields; ModernBERT uses efficient attention and RoPE for long inputs.
- Libraries: Tested with
transformers>=4.48.0
Model sources
- Hugging Face:
thebajajra/RexBERT-mini
โ https://huggingface.co/thebajajra/RexBERT-mini - Background on ModernBERT: https://huggingface.co/docs/transformers/en/model_doc/modernbert and overview: https://huggingface.co/docs/transformers/model_doc/modernbert
Citation
If you use this model, please cite the repository:
@software{rexbert_mini_2025,
title = {RexBERT-mini},
author = {},
year = {2025},
url = {}
}
Contact & maintenance
Author(s): Rahul Bajaj
Issues / questions: Open an issue or discussion on the HF model page.
- Downloads last month
- 48
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
1
Ask for provider support