namdp-ptit/ViDense · Hugging Face

Introduce
Usage
Performance
Contact
Support The Project
Citation

Introduce

ViDense is a VietNamese Embedding Model. Fine-tuned and enhanced with tailored methods, ViDense incorporates advanced techniques to optimize performance for text embeddings in various applications.

Model Configuration and Methods:

Base Model: FacebookAI/xlm-roberta-large
Trained for 10 epochs with a train batch size of 2048.
Utilizes a 3-phase training approach, where the best checkpoint from each phase serves as the base model for the next.
Position Encoding: Rotary Position Encoding
Attention: Blockwise Parallel Transformer
Pooling: Mean Pooling
Momentum Encoder: Incorporates MoCo (Momentum Contrast) to enhance in-batch negative sampling.
Rank Encoder: Introduces a Rank Encoder to account for transitive positive relationships. By considering positives of positives as relevant to the anchor, it reranks the corpus using the Spearman metric and integrates Spearman weights into the loss calculation for improved ranking.
Loss Function: Cross Entropy Loss

Usage

pip install -U transformers

import torch
from transformers import AutoModel, AutoTokenizer


def avg_pooling(attention_mask, outputs):
    last_hidden = outputs.last_hidden_state
    return (last_hidden * attention_mask.unsqueeze(-1)).sum(1) / attention_mask.sum(-1).unsqueeze(-1)


tokenizer = AutoTokenizer.from_pretrained('namdp-ptit/ViDense')
model = AutoModel.from_pretrained('namdp-ptit/ViDense')

sentences = [
    'Tỉnh nào có diện tích lớn nhất Việt Nam',
    'Tỉnh nào có diện tích nhỏ nhất Việt Nam',
    'Tỉnh nào có diện tích rộng nhất Việt Nam'
]

inputs = tokenizer(sentences, return_tensors='pt', padding=True)

with torch.no_grad():
    outputs = model(**inputs)
    outputs = avg_pooling(inputs['attention_mask'], outputs)

cosine_sim_1 = torch.nn.functional.cosine_similarity(
    outputs[0].unsqueeze(0),
    outputs[1].unsqueeze(0)
)
cosine_sim_2 = torch.nn.functional.cosine_similarity(
    outputs[0].unsqueeze(0),
    outputs[2].unsqueeze(0)
)

print(cosine_sim_1.item())  # 0.056096598505973816
print(cosine_sim_2.item())  # 0.9861876964569092

Performance

Below is a comparision table of the results I achieved compared to some other embedding models on three benchmarks: ZAC, WebFaq, OwiFaq with metric Recall@3

Model Name	ZAC	WebFaq	OwiFaq
namdp-ptit/ViDense	54.72	82.26	85.62
VoVanPhuc/sup-SimCSE-VietNamese-phobert-base	53.64	81.52	85.02
keepitreal/vietnamese-sbert	50.45	80.54	78.58
BAAI/bge-m3	46.12	83.45	86.08

Here are the information of these 3 benchmarks:

ZAC: merge train and test into a new benchmark, ~ 3200 queries, ~ 330K documents in corpus
WebFAQ and OwiFaq: merge train and test into a new benchmark, ~ 124K queries, ~ 124K documents in corpus

Contact

Email: [email protected]

LinkedIn: Dang Phuong Nam

Facebook: Phương Nam

Support The Project

If you find this project helpful and wish to support its ongoing development, here are some ways you can contribute:

Star the Repository: Show your appreciation by starring the repository. Your support motivates further development and enhancements.
Contribute: I welcome your contributions! You can help by reporting bugs, submitting pull requests, or suggesting new features.
Donate: If you’d like to support financially, consider making a donation. You can donate through:
- Vietcombank: 9912692172 - DANG PHUONG NAM

Thank you for your support!

Citation

Please cite as

@misc{ViDense,
  title={ViDense: An Embedding Model for Vietnamese Long Context},
  author={Nam Dang Phuong},
  year={2025},
  publisher={Huggingface},
}

namdp-ptit
/

ViDense