⭐ GLiClass: Generalist and Lightweight Model for Sequence Classification

This is an efficient zero-shot classifier inspired by GLiNER work. It demonstrates the same performance as a cross-encoder while being more compute-efficient because classification is done at a single forward path.

It can be used for topic classification, sentiment analysis and as a reranker in RAG pipelines.

The model was trained on synthetic and licensed data that allow commercial use and can be used in commercial applications.

This version of the model uses a layer-wise selection of features that enables a better understanding of different levels of language. The backbone model is ModernBERT-base, which effectively processes long sequences.

How to use:

First of all, you need to install GLiClass library:

pip install gliclass
pip install -U transformers>=4.48.0

Than you need to initialize a model and a pipeline:

from gliclass import GLiClassModel, ZeroShotClassificationPipeline
from transformers import AutoTokenizer

model = GLiClassModel.from_pretrained("alexandrlukashov/gliclass_msmarco_merged")
tokenizer = AutoTokenizer.from_pretrained("alexandrlukashov/gliclass_msmarco_merged", add_prefix_space=True)
pipeline = ZeroShotClassificationPipeline(model, tokenizer, classification_type='multi-label', device='cuda:0')

text = "I want to live in New York."
labels =[
    'York is a cathedral city in North Yorkshire, England, with Roman origins',
    'San Francisco,[23] officially the City and County of San Francisco, is a commercial, financial, and cultural center within Northern California, United States.',
    'New York, often called New York City (NYC),[b] is the most populous city in the United States',
    "New York City is the third album by electronica group Brazilian Girls, released in 2008.",
    "New York City was an American R&B vocal group.",
    "New York City is an album by the Peter Malick Group featuring Norah Jones.",
    "New York City: The Album is the debut studio album by American rapper Troy Ave. ",
    '"New York City" is a song by British new wave band The Armoury Show',
]
results = pipeline(text, labels, threshold=0.5)[0] #because we have one text
for result in results:
 print(result["label"], "=>", result["score"])

Benchmarking:

Dataset	Base NDCG@10	GLiClass NDCG@10
NanoArguAna	0.489	0.525
NanoClimateFEVER	0.318	0.870
NanoDBPedia	0.614	0.871
NanoFEVER	0.809	0.770
NanoFiQA2018	0.437	0.719
NanoHotpotQA	0.828	0.647
NanoMSMARCO	0.540	0.445
NanoNFCorpus	0.325	0.710
NanoNQ	0.501	0.588
NanoQuoraRetrieval	0.869	0.540
NanoSCIDOCS	0.335	0.917
NanoSciFact	0.710	0.652
NanoTouche2020	0.694	0.490
NanoBEIR (mean)	0.574	0.673

knowledgator
/

gliclass_msmarco_merged

⭐ GLiClass: Generalist and Lightweight Model for Sequence Classification

How to use:

Benchmarking:

Dataset used to train knowledgator/gliclass_msmarco_merged