A copy of Jina VDR in BEIR format for usage with MTEB
AI & ML interests
Search foundation models: embeddings, rerankers, small LMs for better search
Recent Activity
View all activity
Convert HTML content to LLM-friendly Markdown/JSON content
-
ReaderLM-v2: Small Language Model for HTML to Markdown and JSON
Paper • 2503.01151 • Published • 1 -
jinaai/ReaderLM-v2
Text Generation • 2B • Updated • 14.7k • • 665 -
jinaai/reader-lm-1.5b
Text Generation • 2B • Updated • 1.47k • • 600 -
jinaai/reader-lm-0.5b
Text Generation • 0.5B • Updated • 1.18k • • 142
A collection of state-of-the-art multilingual neural rerankers
This collection list our ColBERT like late interaction retriever models
A novel set of high-performance sentence embedding models.
-
Jina Embeddings: A Novel Set of High-Performance Sentence Embedding Models
Paper • 2307.11224 • Published • 6 -
jinaai/jina-embedding-l-en-v1
Sentence Similarity • Updated • 230 • 24 -
jinaai/jina-embedding-b-en-v1
Sentence Similarity • Updated • 4.26k • 8 -
jinaai/jina-embedding-s-en-v1
Sentence Similarity • Updated • 4.44k • 26
max. ~1000 images and OCR text included
Multilingual multi-task general text embedding model
Multimodal text-image embeddings
-
jina-clip-v2: Multilingual Multimodal Embeddings for Text and Images
Paper • 2412.08802 • Published • 5 -
Jina CLIP: Your CLIP Model Is Also Your Text Retriever
Paper • 2405.20204 • Published • 37 -
jinaai/jina-clip-v2
Feature Extraction • 0.9B • Updated • 38.2k • • 260 -
jinaai/jina-clip-v1
Feature Extraction • 0.2B • Updated • 62.9k • 251
The V2 family of Jina Embeddings supports encoding large documents with 8k sequence length.
-
Jina Embeddings 2: 8192-Token General-Purpose Text Embeddings for Long Documents
Paper • 2310.19923 • Published • 14 -
Multi-Task Contrastive Learning for 8192-Token Bilingual Text Embeddings
Paper • 2402.17016 • Published • 5 -
jinaai/jina-embeddings-v2-base-en
Feature Extraction • 0.1B • Updated • 194k • 727 -
jinaai/jina-embeddings-v2-base-zh
Feature Extraction • 0.2B • Updated • 39.1k • 240
Neural Reranker models for English language
A copy of Jina VDR in BEIR format for usage with MTEB
max. ~1000 images and OCR text included
Convert HTML content to LLM-friendly Markdown/JSON content
-
ReaderLM-v2: Small Language Model for HTML to Markdown and JSON
Paper • 2503.01151 • Published • 1 -
jinaai/ReaderLM-v2
Text Generation • 2B • Updated • 14.7k • • 665 -
jinaai/reader-lm-1.5b
Text Generation • 2B • Updated • 1.47k • • 600 -
jinaai/reader-lm-0.5b
Text Generation • 0.5B • Updated • 1.18k • • 142
Multilingual multi-task general text embedding model
A collection of state-of-the-art multilingual neural rerankers
Multimodal text-image embeddings
-
jina-clip-v2: Multilingual Multimodal Embeddings for Text and Images
Paper • 2412.08802 • Published • 5 -
Jina CLIP: Your CLIP Model Is Also Your Text Retriever
Paper • 2405.20204 • Published • 37 -
jinaai/jina-clip-v2
Feature Extraction • 0.9B • Updated • 38.2k • • 260 -
jinaai/jina-clip-v1
Feature Extraction • 0.2B • Updated • 62.9k • 251
This collection list our ColBERT like late interaction retriever models
The V2 family of Jina Embeddings supports encoding large documents with 8k sequence length.
-
Jina Embeddings 2: 8192-Token General-Purpose Text Embeddings for Long Documents
Paper • 2310.19923 • Published • 14 -
Multi-Task Contrastive Learning for 8192-Token Bilingual Text Embeddings
Paper • 2402.17016 • Published • 5 -
jinaai/jina-embeddings-v2-base-en
Feature Extraction • 0.1B • Updated • 194k • 727 -
jinaai/jina-embeddings-v2-base-zh
Feature Extraction • 0.2B • Updated • 39.1k • 240
A novel set of high-performance sentence embedding models.
-
Jina Embeddings: A Novel Set of High-Performance Sentence Embedding Models
Paper • 2307.11224 • Published • 6 -
jinaai/jina-embedding-l-en-v1
Sentence Similarity • Updated • 230 • 24 -
jinaai/jina-embedding-b-en-v1
Sentence Similarity • Updated • 4.26k • 8 -
jinaai/jina-embedding-s-en-v1
Sentence Similarity • Updated • 4.44k • 26
Neural Reranker models for English language