Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2501.05874

Research Papers

VideoRAG: Retrieval-Augmented Generation over Video Corpus

Paper • 2501.05874 • Published 18 days ago • 66

Multimodal Understanding

VideoRAG: Retrieval-Augmented Generation over Video Corpus

Paper • 2501.05874 • Published 18 days ago • 66

Papers exploring RAG techniques that combine language models with external knowledge retrieval to improve accuracy and reduce hallucinations.

VideoRAG: Retrieval-Augmented Generation over Video Corpus

Paper • 2501.05874 • Published 18 days ago • 66

VideoRAG: Retrieval-Augmented Generation over Video Corpus

Paper • 2501.05874 • Published 18 days ago • 66

LLaVA-Mini: Efficient Image and Video Large Multimodal Models with One Vision Token

Paper • 2501.03895 • Published 20 days ago • 48
Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos

Paper • 2501.04001 • Published 20 days ago • 42
Are VLMs Ready for Autonomous Driving? An Empirical Study from the Reliability, Data, and Metric Perspectives

Paper • 2501.04003 • Published 20 days ago • 24
VideoRAG: Retrieval-Augmented Generation over Video Corpus

Paper • 2501.05874 • Published 18 days ago • 66

2025 LLM Papers on Hugging Face with Japanese Memos

MotionBench: Benchmarking and Improving Fine-grained Video Motion Understanding for Vision Language Models

Paper • 2501.02955 • Published 22 days ago • 40
2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining

Paper • 2501.00958 • Published 26 days ago • 98
MMVU: Measuring Expert-Level Multi-Discipline Video Understanding

Paper • 2501.12380 • Published 6 days ago • 77
VideoWorld: Exploring Knowledge Learning from Unlabeled Videos

Paper • 2501.09781 • Published 11 days ago • 22

Video Creation by Demonstration

Paper • 2412.09551 • Published Dec 12, 2024 • 8
DiffSensei: Bridging Multi-Modal LLMs and Diffusion Models for Customized Manga Generation

Paper • 2412.07589 • Published Dec 10, 2024 • 45
Unraveling the Complexity of Memory in RL Agents: an Approach for Classification and Evaluation

Paper • 2412.06531 • Published Dec 9, 2024 • 71
APOLLO: SGD-like Memory, AdamW-level Performance

Paper • 2412.05270 • Published Dec 6, 2024 • 38

gradientai/Llama-3-8B-Instruct-Gradient-1048k

Text Generation • Updated Oct 29, 2024 • 6.71k • 680
Are Your LLMs Capable of Stable Reasoning?

Paper • 2412.13147 • Published Dec 17, 2024 • 91
RetroLLM: Empowering Large Language Models to Retrieve Fine-grained Evidence within Generation

Paper • 2412.11919 • Published Dec 16, 2024 • 33
HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs

Paper • 2412.18925 • Published Dec 25, 2024 • 97

Computer vision

A Comprehensive Survey of Mamba Architectures for Medical Image Analysis: Classification, Segmentation, Restoration and Beyond

Paper • 2410.02362 • Published Oct 3, 2024 • 18
CheXagent: Towards a Foundation Model for Chest X-Ray Interpretation

Paper • 2401.12208 • Published Jan 22, 2024 • 22
Reliable Tuberculosis Detection using Chest X-ray with Deep Learning, Segmentation and Visualization

Paper • 2007.14895 • Published Jul 29, 2020 • 1
Explanatory Instructions: Towards Unified Vision Tasks Understanding and Zero-shot Generalization

Paper • 2412.18525 • Published Dec 24, 2024 • 71

Interesting Papers

These papers are interesting (to me)

about 12 hours ago

Revisit Large-Scale Image-Caption Data in Pre-training Multimodal Foundation Models

Paper • 2410.02740 • Published Oct 3, 2024 • 52
From Code to Correctness: Closing the Last Mile of Code Generation with Hierarchical Debugging

Paper • 2410.01215 • Published Oct 2, 2024 • 30
Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models

Paper • 2409.17146 • Published Sep 25, 2024 • 106
EuroLLM: Multilingual Language Models for Europe

Paper • 2409.16235 • Published Sep 24, 2024 • 26

Previous
1
2
Next

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs