Collections
Discover the best community collections!
Collections including paper arxiv:2501.04652
-
2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining
Paper • 2501.00958 • Published • 92 -
CodeElo: Benchmarking Competition-level Code Generation of LLMs with Human-comparable Elo Ratings
Paper • 2501.01257 • Published • 45 -
Reconstruction vs. Generation: Taming Optimization Dilemma in Latent Diffusion Models
Paper • 2501.01423 • Published • 34 -
REDUCIO! Generating 1024times1024 Video within 16 Seconds using Extremely Compressed Motion Latents
Paper • 2411.13552 • Published
-
On Domain-Specific Post-Training for Multimodal Large Language Models
Paper • 2411.19930 • Published • 25 -
VisDoM: Multi-Document QA with Visually Rich Elements Using Multimodal Retrieval-Augmented Generation
Paper • 2412.10704 • Published • 15 -
Multi-task retriever fine-tuning for domain-specific and efficient RAG
Paper • 2501.04652 • Published • 8 -
M-A-D/Mixed-Arabic-Datasets-Repo
Viewer • Updated • 209M • 2.3k • 29
-
WebRL: Training LLM Web Agents via Self-Evolving Online Curriculum Reinforcement Learning
Paper • 2411.02337 • Published • 35 -
Mixture-of-Transformers: A Sparse and Scalable Architecture for Multi-Modal Foundation Models
Paper • 2411.04996 • Published • 50 -
Large Language Models Orchestrating Structured Reasoning Achieve Kaggle Grandmaster Level
Paper • 2411.03562 • Published • 65 -
StructRAG: Boosting Knowledge Intensive Reasoning of LLMs via Inference-time Hybrid Information Structurization
Paper • 2410.08815 • Published • 44
-
Enhancing Structured-Data Retrieval with GraphRAG: Soccer Data Case Study
Paper • 2409.17580 • Published • 8 -
StructRAG: Boosting Knowledge Intensive Reasoning of LLMs via Inference-time Hybrid Information Structurization
Paper • 2410.08815 • Published • 44 -
Multi-task retriever fine-tuning for domain-specific and efficient RAG
Paper • 2501.04652 • Published • 8
-
LinFusion: 1 GPU, 1 Minute, 16K Image
Paper • 2409.02097 • Published • 33 -
Phidias: A Generative Model for Creating 3D Content from Text, Image, and 3D Conditions with Reference-Augmented Diffusion
Paper • 2409.11406 • Published • 26 -
Diffusion Models Are Real-Time Game Engines
Paper • 2408.14837 • Published • 122 -
Segment Anything with Multiple Modalities
Paper • 2408.09085 • Published • 21
-
Similarity is Not All You Need: Endowing Retrieval Augmented Generation with Multi Layered Thoughts
Paper • 2405.19893 • Published • 31 -
From RAGs to rich parameters: Probing how language models utilize external knowledge over parametric information for factual queries
Paper • 2406.12824 • Published • 20 -
LongRAG: Enhancing Retrieval-Augmented Generation with Long-context LLMs
Paper • 2406.15319 • Published • 62 -
A Tale of Trust and Accuracy: Base vs. Instruct LLMs in RAG Systems
Paper • 2406.14972 • Published • 7