RexBERT-mini

An efficient, English encoder-only model (masked-language model) with ~8k token context, targeted at e-commerce and retail NLP.


Model summary

  • Model type: ModernBertForMaskedLM (encoder-only, masked-language modeling head)
  • Domain: e-commerce/retail/shopping
  • Language: English
  • Context length: 7,999โ€“8,192 tokens (config max_position_embeddings=7999; ModernBERT supports up to 8192)
  • License: Apache-2.0

Intended uses & limitations

Direct use

  • Fill-mask and cloze completion (e.g., product titles, attributes, query reformulation).
  • Embeddings / feature extraction for classification, clustering, retrieval re-ranking, and semantic search in retail catalogs and queries (via pooled encoder states). (ModernBERT is a drop-in BERT-style encoder.)

Downstream use

  • Fine-tune for product categorization, attribute extraction, NER, intent classification, and retrieval-augmented ranking tasks in commerce search & browse. (Use a task head or pooled embeddings.)

Out-of-scope / not recommended

  • Autoregressive text generation or chat; this is not a decoder LLM. Use decoder or seq2seq models for long-form generation.

How to get started

from transformers import AutoTokenizer, AutoModelForMaskedLM

model_id = "thebajajra/RexBERT-mini"
tok = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForMaskedLM.from_pretrained(model_id)

text = "The customer purchased a [MASK] with free shipping."
inputs = tok(text, return_tensors="pt")
logits = model(**inputs).logits  # use top-k on tok.mask_token_id

Model details

Architecture (from config)

  • Backbone: ModernBERT (model_type: "modernbert", architectures: ["ModernBertForMaskedLM"])
  • Layers / heads / width: 19 encoder layers, 8 attention heads, hidden size 512; intermediate (MLP) size 768; GELU activations.
  • Attention: Local window 128 with global attention every 3 layers; RoPE ฮธ=160k (local & global).
  • Positional strategy: position_embedding_type: "sans_pos".

Training data & procedure


Performance Highlights

MLM โ€“ Token Classification (E-Commerce)

RexBERT-mini outperforms DistilBERT on all token classification tasks

Product Title

image/png

Product Description

image/png


Technical notes for practitioners

  • Pooling: Use mean pooling over last hidden states (the configโ€™s classifier pooling is "mean"), or task-specific pooling.
  • Long sequences: Leverage the extended context for product pages, multi-turn queries, or concatenated fields; ModernBERT uses efficient attention and RoPE for long inputs.
  • Libraries: Tested with transformers>=4.48.0

Model sources


Citation

If you use this model, please cite the repository:

@software{rexbert_mini_2025,
  title        = {RexBERT-mini},
  author       = {},
  year         = {2025},
  url          = {}
}

Contact & maintenance

  • Author(s): Rahul Bajaj

  • Issues / questions: Open an issue or discussion on the HF model page.


Downloads last month
48
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ 1 Ask for provider support