Vietnamese Corpus Symato/cc Updated Jul 11, 2023 • 102k • 2 Symato/c4_vi-filtered_200GB Viewer • Updated Sep 27, 2024 • 38.6M • 521 Symato/goods_vs_c4_cc_classifiers Viewer • Updated Jul 3, 2023 • 101k • 47 Symato/madlad-400_vi Viewer • Updated Sep 27, 2024 • 54.8M • 297 • 1
RAG RAG related Datasets and Tools Symato/RAG_UltraDomain Preview • Updated Sep 25, 2024 • 69 • 2 jinaai/jina-colbert-v2 0.6B • Updated Jan 17 • 61.7k • 140 Running 14 ContextualBench-Leaderboard 🥇 14 View and submit LLM benchmark evaluations samaya-ai/msmarco-w-instructions Viewer • Updated Sep 18, 2024 • 980k • 440 • 3
Visual Datasets one image is worth a thousand words TIGER-Lab/VisualWebInstruct-Seed Viewer • Updated Mar 16 • 60.3k • 277 • 18 5CD-AI/Viet-ShareGPT-4o-Text-VQA Viewer • Updated Oct 1, 2024 • 42.7k • 105 • 54 5CD-AI/Viet-LAION-Gemini-VQA Viewer • Updated Oct 3, 2024 • 844k • 421 • 46 vidore/colpali_train_set Viewer • Updated Jun 20 • 119k • 6.01k • 88
trimm_vocab Cắt bớt vocab giữ lại En Vi để model nhỏ gọn hơn, ko sản xuất tiếng Trung trong quá trình sử dụng Symato/Qwen2.5-7B-Instruct__trimm_vocab Updated Oct 21, 2024 • 3 Symato/bge-reranker-v2-m3__trimm_vocab__bf16 0.4B • Updated Oct 18, 2024 • 14 Symato/bge-m3__trimm_vocab__bf16 0.4B • Updated Oct 22, 2024 • 18 Symato/facebook_xlm-roberta-large__trimm_vocab__bf16 0.4B • Updated Oct 18, 2024 • 7
Knowledge Base Ít nhưng chất lượng Symato/KB_wikimedia Viewer • Updated Sep 27, 2024 • 1.29M • 50 • 1 Symato/wikihow_vi-en-zh Viewer • Updated Sep 27, 2024 • 9.24k • 30 • 1 Symato/KB_tve-selected-books Updated Sep 28, 2024 • 10
Vietnamese LLMs The good ones SeaLLMs/SeaLLMs-v3-7B-Chat Text Generation • 8B • Updated Sep 2, 2024 • 1.17k • • 67 CohereLabs/c4ai-command-r-plus-08-2024 Text Generation • 104B • Updated Oct 30 • 2.47k • 278 google/gemma-2-27b-it Text Generation • 27B • Updated Aug 27, 2024 • 542k • 556 Viet-Mistral/Vistral-7B-Chat Text Generation • 7B • Updated Feb 27, 2024 • 3.58k • 145
trimm_vocab Cắt bớt vocab giữ lại En Vi để model nhỏ gọn hơn, ko sản xuất tiếng Trung trong quá trình sử dụng Symato/Qwen2.5-7B-Instruct__trimm_vocab Updated Oct 21, 2024 • 3 Symato/bge-reranker-v2-m3__trimm_vocab__bf16 0.4B • Updated Oct 18, 2024 • 14 Symato/bge-m3__trimm_vocab__bf16 0.4B • Updated Oct 22, 2024 • 18 Symato/facebook_xlm-roberta-large__trimm_vocab__bf16 0.4B • Updated Oct 18, 2024 • 7
Vietnamese Corpus Symato/cc Updated Jul 11, 2023 • 102k • 2 Symato/c4_vi-filtered_200GB Viewer • Updated Sep 27, 2024 • 38.6M • 521 Symato/goods_vs_c4_cc_classifiers Viewer • Updated Jul 3, 2023 • 101k • 47 Symato/madlad-400_vi Viewer • Updated Sep 27, 2024 • 54.8M • 297 • 1
Knowledge Base Ít nhưng chất lượng Symato/KB_wikimedia Viewer • Updated Sep 27, 2024 • 1.29M • 50 • 1 Symato/wikihow_vi-en-zh Viewer • Updated Sep 27, 2024 • 9.24k • 30 • 1 Symato/KB_tve-selected-books Updated Sep 28, 2024 • 10
RAG RAG related Datasets and Tools Symato/RAG_UltraDomain Preview • Updated Sep 25, 2024 • 69 • 2 jinaai/jina-colbert-v2 0.6B • Updated Jan 17 • 61.7k • 140 Running 14 ContextualBench-Leaderboard 🥇 14 View and submit LLM benchmark evaluations samaya-ai/msmarco-w-instructions Viewer • Updated Sep 18, 2024 • 980k • 440 • 3
Vietnamese LLMs The good ones SeaLLMs/SeaLLMs-v3-7B-Chat Text Generation • 8B • Updated Sep 2, 2024 • 1.17k • • 67 CohereLabs/c4ai-command-r-plus-08-2024 Text Generation • 104B • Updated Oct 30 • 2.47k • 278 google/gemma-2-27b-it Text Generation • 27B • Updated Aug 27, 2024 • 542k • 556 Viet-Mistral/Vistral-7B-Chat Text Generation • 7B • Updated Feb 27, 2024 • 3.58k • 145
Visual Datasets one image is worth a thousand words TIGER-Lab/VisualWebInstruct-Seed Viewer • Updated Mar 16 • 60.3k • 277 • 18 5CD-AI/Viet-ShareGPT-4o-Text-VQA Viewer • Updated Oct 1, 2024 • 42.7k • 105 • 54 5CD-AI/Viet-LAION-Gemini-VQA Viewer • Updated Oct 3, 2024 • 844k • 421 • 46 vidore/colpali_train_set Viewer • Updated Jun 20 • 119k • 6.01k • 88