E5-Base-Math: Fine-tuned Vietnamese Math Embedding Model
Model Description
This is a fine-tuned version of intfloat/multilingual-e5-base
optimized for Vietnamese mathematics content. The model is specifically trained for embedding mathematical concepts, definitions, and problem-solving content in Vietnamese.
Training Details
Base Model
- Base model:
intfloat/multilingual-e5-base
- Fine-tuning objective: Information Retrieval / Sentence Embedding
- Training date: 2025-06-24
Training Configuration
- Batch size: 4
- Learning rate: 2e-05
- Epochs: 3
- Max sequence length: 256
- Warmup steps: 100
Training Data
- Domain: Vietnamese Mathematics
- Training examples: 2055
- Validation examples: 229
Usage
Using SentenceTransformers
from sentence_transformers import SentenceTransformer
model = SentenceTransformer('ThanhLe0125/e5-base-math')
# Encode queries (add prefix for better performance)
queries = ["query: Định nghĩa hàm số đồng biến là gì?"]
query_embeddings = model.encode(queries)
# Encode passages/documents
passages = ["passage: Hàm số đồng biến trên khoảng (a;b) là hàm số mà với mọi x1 < x2 thì f(x1) < f(x2)"]
passage_embeddings = model.encode(passages)
# Calculate similarity
from sklearn.metrics.pairwise import cosine_similarity
similarity = cosine_similarity(query_embeddings, passage_embeddings)
For RAG Applications
# Recommended usage for RAG
def encode_query(query_text):
return model.encode([f"query: {query_text}"])
def encode_passage(passage_text):
return model.encode([f"passage: {passage_text}"])
# Example usage
query_emb = encode_query("Định nghĩa hàm số đồng biến")
passage_emb = encode_passage("Hàm số đồng biến là...")
# Calculate similarity
similarity = cosine_similarity(query_emb, passage_emb)[0][0]
print(f"Similarity: {similarity:.4f}")
Applications
- Information Retrieval: Finding relevant mathematical content
- RAG Systems: Retrieval-Augmented Generation for math Q&A
- Semantic Search: Searching through mathematical documents
- Content Recommendation: Suggesting related mathematical concepts
Performance
This model has been fine-tuned specifically for Vietnamese mathematical content and should perform better than the base model for math-related queries in Vietnamese.
Languages
- Vietnamese (primary)
- English (inherited from base model)
License
This model inherits the license from the base model intfloat/multilingual-e5-base
.
Citation
If you use this model, please cite:
@misc{e5-base-math,
author = {ThanhLe},
title = {E5-Base-Math: Fine-tuned Vietnamese Math Embedding Model},
year = {2025},
publisher = {Hugging Face},
howpublished = {\url{https://huggingface.co/ThanhLe0125/e5-base-math}}
}
Contact
For questions or issues, please contact via the repository discussions.
- Downloads last month
- 9
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
Model tree for ThanhLe0125/e5-base-math
Base model
intfloat/multilingual-e5-base