view article Article Transformers.js v3: WebGPU support, new models & tasks, and more… Oct 22, 2024 • 73
Benchmarking AI Models in Software Engineering: A Review, Search Tool, and Enhancement Protocol Paper • 2503.05860 • Published Mar 7 • 10
AnyMoLe: Any Character Motion In-betweening Leveraging Video Diffusion Models Paper • 2503.08417 • Published Mar 11 • 8
Crowdsource, Crawl, or Generate? Creating SEA-VL, a Multicultural Vision-Language Dataset for Southeast Asia Paper • 2503.07920 • Published Mar 10 • 97
Optimizing Test-Time Compute via Meta Reinforcement Fine-Tuning Paper • 2503.07572 • Published Mar 10 • 44
LMM-R1: Empowering 3B LMMs with Strong Reasoning Abilities Through Two-Stage Rule-Based RL Paper • 2503.07536 • Published Mar 10 • 85
MedXpertQA: Benchmarking Expert-Level Medical Reasoning and Understanding Paper • 2501.18362 • Published Jan 30 • 22
VideoLLaMA 3: Frontier Multimodal Foundation Models for Image and Video Understanding Paper • 2501.13106 • Published Jan 22 • 91
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning Paper • 2501.12948 • Published Jan 22 • 386
Test-Time Preference Optimization: On-the-Fly Alignment via Iterative Textual Feedback Paper • 2501.12895 • Published Jan 22 • 61
Apollo: An Exploration of Video Understanding in Large Multimodal Models Paper • 2412.10360 • Published Dec 13, 2024 • 147
Large Action Models: From Inception to Implementation Paper • 2412.10047 • Published Dec 13, 2024 • 35
PIG: Physics-Informed Gaussians as Adaptive Parametric Mesh Representations Paper • 2412.05994 • Published Dec 8, 2024 • 18
InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions Paper • 2412.09596 • Published Dec 12, 2024 • 99