⭐ GLiClass: Generalist and Lightweight Model for Sequence Classification
This is an efficient zero-shot classifier inspired by GLiNER work. It demonstrates the same performance as a cross-encoder while being more compute-efficient because classification is done at a single forward path.
It can be used for topic classification
, sentiment analysis
and as a reranker in RAG
pipelines.
The model was trained on synthetic and licensed data that allow commercial use and can be used in commercial applications.
This version of the model uses a layer-wise selection of features that enables a better understanding of different levels of language. The backbone model is ModernBERT-base, which effectively processes long sequences.
How to use:
First of all, you need to install GLiClass library:
pip install gliclass
pip install -U transformers>=4.48.0
Than you need to initialize a model and a pipeline:
from gliclass import GLiClassModel, ZeroShotClassificationPipeline
from transformers import AutoTokenizer
model = GLiClassModel.from_pretrained("alexandrlukashov/gliclass_msmarco_merged")
tokenizer = AutoTokenizer.from_pretrained("alexandrlukashov/gliclass_msmarco_merged", add_prefix_space=True)
pipeline = ZeroShotClassificationPipeline(model, tokenizer, classification_type='multi-label', device='cuda:0')
text = "I want to live in New York."
labels =[
'York is a cathedral city in North Yorkshire, England, with Roman origins',
'San Francisco,[23] officially the City and County of San Francisco, is a commercial, financial, and cultural center within Northern California, United States.',
'New York, often called New York City (NYC),[b] is the most populous city in the United States',
"New York City is the third album by electronica group Brazilian Girls, released in 2008.",
"New York City was an American R&B vocal group.",
"New York City is an album by the Peter Malick Group featuring Norah Jones.",
"New York City: The Album is the debut studio album by American rapper Troy Ave. ",
'"New York City" is a song by British new wave band The Armoury Show',
]
results = pipeline(text, labels, threshold=0.5)[0] #because we have one text
for result in results:
print(result["label"], "=>", result["score"])
Benchmarking:
Dataset | Base NDCG@10 | GLiClass NDCG@10 |
---|---|---|
NanoArguAna | 0.489 | 0.525 |
NanoClimateFEVER | 0.318 | 0.870 |
NanoDBPedia | 0.614 | 0.871 |
NanoFEVER | 0.809 | 0.770 |
NanoFiQA2018 | 0.437 | 0.719 |
NanoHotpotQA | 0.828 | 0.647 |
NanoMSMARCO | 0.540 | 0.445 |
NanoNFCorpus | 0.325 | 0.710 |
NanoNQ | 0.501 | 0.588 |
NanoQuoraRetrieval | 0.869 | 0.540 |
NanoSCIDOCS | 0.335 | 0.917 |
NanoSciFact | 0.710 | 0.652 |
NanoTouche2020 | 0.694 | 0.490 |
NanoBEIR (mean) | 0.574 | 0.673 |
- Downloads last month
- 500