Ita-Search 🇮🇹
Fine-tuned Qwen3-Embedding for Italian Semantic Retrieval
This model is a specialized fine-tuned version of Qwen/Qwen3-Embedding-0.6B optimized for Italian semantic retrieval tasks, with particular emphasis on Italian query understanding and document ranking.
Model Description
- Model Type: Dense embedding model for semantic retrieval
- Base Model: Qwen/Qwen3-Embedding-0.6B
- Output Dimensionality: 1,024-dimensional dense vectors
- Maximum Sequence Length: 32,768 tokens
- Primary Language: Italian
- Similarity Function: Cosine similarity
Capabilities
Italian Semantic Retrieval
The model demonstrates strong performance in matching Italian queries to Italian documents, particularly effective in technical and academic domains within the Italian language context.
Domain Coverage
Trained on diverse Italian knowledge domains including:
- Medical & Health Sciences: Diagnostic imaging, clinical procedures, medical terminology
- STEM Fields: Physics, computer science, geology, engineering
- Professional Domains: Finance, law, agriculture, software development
- Educational Content: Historical studies, culinary arts, general knowledge
Query Understanding
Enhanced comprehension of:
- Conversational and informal Italian query patterns
- Technical terminology in Italian across domains
- Italian semantic concepts and nuances
- Complex multi-faceted questions in Italian
Training Data
The model was fine-tuned on a curated corpus of Italian semantic data, featuring high-quality triplets designed to capture semantic nuances across multiple domains. The dataset emphasizes:
- Hard negative mining: Strategic inclusion of semantically related but incorrect documents
- Italian language focus: Comprehensive representation of Italian language patterns
- Domain diversity: Comprehensive coverage of academic, professional, and conversational contexts in Italian
- Quality curation: Manual review and automated filtering for coherence and relevance
Usage
Basic Retrieval
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("DeepMount00/Ita-Search")
# Italian query-document matching
query = "Come si distingue una faglia trascorrente da una normale?"
documents = [
"Le faglie trascorrenti sono caratterizzate da movimento orizzontale...",
"Le faglie normali si verificano a causa di stress estensionale...",
"Le strategie di gestione del portafoglio di investimenti..."
]
query_embedding = model.encode(query, prompt="Represent this search query for finding relevant passages: ")
doc_embeddings = model.encode(documents, prompt="Represent this passage for retrieval: ")
similarities = model.similarity(query_embedding, doc_embeddings)
Prompt Templates
The model is optimized for specific prompt templates:
- Queries:
"Represent this search query for finding relevant passages: "
- Documents:
"Represent this passage for retrieval: "
Applications
- Italian information retrieval systems
- Academic and technical document search in Italian
- Italian question-answering platforms
- Educational content recommendation for Italian speakers
- Professional knowledge base systems in Italian
Limitations
- Language coverage: Specifically optimized for Italian language
- Domain specificity: Performance may vary on highly specialized domains not represented in training
Acknowledgments
This work builds upon the Qwen3-Embedding architecture and advances in contrastive learning for dense retrieval. We acknowledge the contributions of the Qwen team and the sentence-transformers community.
License: Inherits licensing terms from the base Qwen/Qwen3-Embedding-0.6B model.
- Downloads last month
- 37