RexBERT-mini

An efficient, English encoder-only model (masked-language model) with ~8k token context, targeted at e-commerce and retail NLP.

Model summary

Model type: ModernBertForMaskedLM (encoder-only, masked-language modeling head)
Domain: e-commerce/retail/shopping
Language: English
Context length: 7,999–8,192 tokens (config max_position_embeddings=7999; ModernBERT supports up to 8192)
License: Apache-2.0

Intended uses & limitations

Direct use

Fill-mask and cloze completion (e.g., product titles, attributes, query reformulation).
Embeddings / feature extraction for classification, clustering, retrieval re-ranking, and semantic search in retail catalogs and queries (via pooled encoder states). (ModernBERT is a drop-in BERT-style encoder.)

Downstream use

Fine-tune for product categorization, attribute extraction, NER, intent classification, and retrieval-augmented ranking tasks in commerce search & browse. (Use a task head or pooled embeddings.)

Out-of-scope / not recommended

Autoregressive text generation or chat; this is not a decoder LLM. Use decoder or seq2seq models for long-form generation.

How to get started

from transformers import AutoTokenizer, AutoModelForMaskedLM

model_id = "thebajajra/RexBERT-mini"
tok = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForMaskedLM.from_pretrained(model_id)

text = "The customer purchased a [MASK] with free shipping."
inputs = tok(text, return_tensors="pt")
logits = model(**inputs).logits  # use top-k on tok.mask_token_id

Model details

Architecture (from config)

Backbone: ModernBERT (model_type: "modernbert", architectures: ["ModernBertForMaskedLM"])
Layers / heads / width: 19 encoder layers, 8 attention heads, hidden size 512; intermediate (MLP) size 768; GELU activations.
Attention: Local window 128 with global attention every 3 layers; RoPE θ=160k (local & global).
Positional strategy: position_embedding_type: "sans_pos".

Training data & procedure

Performance Highlights

MLM – Token Classification (E-Commerce)

RexBERT-mini outperforms DistilBERT on all token classification tasks

Product Title

Product Description

Technical notes for practitioners

Pooling: Use mean pooling over last hidden states (the config’s classifier pooling is "mean"), or task-specific pooling.
Long sequences: Leverage the extended context for product pages, multi-turn queries, or concatenated fields; ModernBERT uses efficient attention and RoPE for long inputs.
Libraries: Tested with transformers>=4.48.0

Model sources

Hugging Face: thebajajra/RexBERT-mini — https://huggingface.co/thebajajra/RexBERT-mini
Background on ModernBERT: https://huggingface.co/docs/transformers/en/model_doc/modernbert and overview: https://huggingface.co/docs/transformers/model_doc/modernbert

Citation

If you use this model, please cite the repository:

@software{rexbert_mini_2025,
  title        = {RexBERT-mini},
  author       = {},
  year         = {2025},
  url          = {}
}

Contact & maintenance

Author(s): Rahul Bajaj
Issues / questions: Open an issue or discussion on the HF model page.