E5-Base-Math: Fine-tuned Vietnamese Math Embedding Model

Model Description

This is a fine-tuned version of intfloat/multilingual-e5-base optimized for Vietnamese mathematics content. The model is specifically trained for embedding mathematical concepts, definitions, and problem-solving content in Vietnamese.

Training Details

Base Model

  • Base model: intfloat/multilingual-e5-base
  • Fine-tuning objective: Information Retrieval / Sentence Embedding
  • Training date: 2025-06-24

Training Configuration

  • Batch size: 4
  • Learning rate: 2e-05
  • Epochs: 3
  • Max sequence length: 256
  • Warmup steps: 100

Training Data

  • Domain: Vietnamese Mathematics
  • Training examples: 2055
  • Validation examples: 229

Usage

Using SentenceTransformers

from sentence_transformers import SentenceTransformer

model = SentenceTransformer('ThanhLe0125/e5-base-math')

# Encode queries (add prefix for better performance)
queries = ["query: Định nghĩa hàm số đồng biến là gì?"]
query_embeddings = model.encode(queries)

# Encode passages/documents  
passages = ["passage: Hàm số đồng biến trên khoảng (a;b) là hàm số mà với mọi x1 < x2 thì f(x1) < f(x2)"]
passage_embeddings = model.encode(passages)

# Calculate similarity
from sklearn.metrics.pairwise import cosine_similarity
similarity = cosine_similarity(query_embeddings, passage_embeddings)

For RAG Applications

# Recommended usage for RAG
def encode_query(query_text):
    return model.encode([f"query: {query_text}"])

def encode_passage(passage_text):  
    return model.encode([f"passage: {passage_text}"])

# Example usage
query_emb = encode_query("Định nghĩa hàm số đồng biến")
passage_emb = encode_passage("Hàm số đồng biến là...")

# Calculate similarity
similarity = cosine_similarity(query_emb, passage_emb)[0][0]
print(f"Similarity: {similarity:.4f}")

Applications

  • Information Retrieval: Finding relevant mathematical content
  • RAG Systems: Retrieval-Augmented Generation for math Q&A
  • Semantic Search: Searching through mathematical documents
  • Content Recommendation: Suggesting related mathematical concepts

Performance

This model has been fine-tuned specifically for Vietnamese mathematical content and should perform better than the base model for math-related queries in Vietnamese.

Languages

  • Vietnamese (primary)
  • English (inherited from base model)

License

This model inherits the license from the base model intfloat/multilingual-e5-base.

Citation

If you use this model, please cite:

@misc{e5-base-math,
  author = {ThanhLe},
  title = {E5-Base-Math: Fine-tuned Vietnamese Math Embedding Model},
  year = {2025},
  publisher = {Hugging Face},
  howpublished = {\url{https://huggingface.co/ThanhLe0125/e5-base-math}}
}

Contact

For questions or issues, please contact via the repository discussions.

Downloads last month
9
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ThanhLe0125/e5-base-math

Finetuned
(73)
this model