SAM2Long: Enhancing SAM 2 for Long Video Segmentation with a Training-Free Memory Tree Paper • 2410.16268 • Published Oct 21, 2024 • 70
GPTailor: Large Language Model Pruning Through Layer Cutting and Stitching Paper • 2506.20480 • Published Jun 25 • 7
Parallelizing Linear Transformers with the Delta Rule over Sequence Length Paper • 2406.06484 • Published Jun 10, 2024 • 4
view article Article Stable Diffusion with 🧨 Diffusers By valhalla and 3 others • Aug 22, 2022 • 66
Essential-Web v1.0: 24T tokens of organized web data Paper • 2506.14111 • Published Jun 17 • 41
Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads Paper • 2401.10774 • Published Jan 19, 2024 • 59
CASS: Nvidia to AMD Transpilation with Data, Models, and Benchmark Paper • 2505.16968 • Published May 22 • 41
Open CaptchaWorld: A Comprehensive Web-based Platform for Testing and Benchmarking Multimodal LLM Agents Paper • 2505.24878 • Published May 30 • 23
Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning Paper • 2506.01939 • Published Jun 2 • 176
ViStoryBench: Comprehensive Benchmark Suite for Story Visualization Paper • 2505.24862 • Published May 30 • 31
CoDA: Coordinated Diffusion Noise Optimization for Whole-Body Manipulation of Articulated Objects Paper • 2505.21437 • Published May 27 • 22
Time Blindness: Why Video-Language Models Can't See What Humans Can? Paper • 2505.24867 • Published May 30 • 80
SVRPBench: A Realistic Benchmark for Stochastic Vehicle Routing Problem Paper • 2505.21887 • Published May 28 • 14
CASS Collection Large-scale dataset and model suite for cross-architecture GPU code transpilation between CUDA and HIP at both source and assembly levels • 2 items • Updated May 15 • 5
view article Article Vision Language Models (Better, Faster, Stronger) By merve and 4 others • May 12 • 491