StructEval: Benchmarking LLMs' Capabilities to Generate Structural Outputs Paper • 2505.20139 • Published 23 days ago • 18
MEGA-Bench: Scaling Multimodal Evaluation to over 500 Real-World Tasks Paper • 2410.10563 • Published Oct 14, 2024 • 39
VLM2Vec: Training Vision-Language Models for Massive Multimodal Embedding Tasks Paper • 2410.05160 • Published Oct 7, 2024 • 4
MantisScore: Building Automatic Metrics to Simulate Fine-grained Human Feedback for Video Generation Paper • 2406.15252 • Published Jun 21, 2024 • 18
LongRAG: Enhancing Retrieval-Augmented Generation with Long-context LLMs Paper • 2406.15319 • Published Jun 21, 2024 • 65