Step1X-Edit: A Practical Framework for General Image Editing Paper • 2504.17761 • Published 2 days ago • 60
Paper2Code: Automating Code Generation from Scientific Papers in Machine Learning Paper • 2504.17192 • Published 2 days ago • 55
Breaking the Modality Barrier: Universal Embedding Learning with Multimodal LLMs Paper • 2504.17432 • Published 2 days ago • 30
Perspective-Aware Reasoning in Vision-Language Models via Mental Imagery Simulation Paper • 2504.17207 • Published 2 days ago • 18
QuaDMix: Quality-Diversity Balanced Data Selection for Efficient LLM Pretraining Paper • 2504.16511 • Published 3 days ago • 15
RefVNLI: Towards Scalable Evaluation of Subject-driven Text-to-image Generation Paper • 2504.17502 • Published 2 days ago • 47
Distilling semantically aware orders for autoregressive image generation Paper • 2504.17069 • Published 3 days ago • 4
Token-Shuffle: Towards High-Resolution Image Generation with Autoregressive Models Paper • 2504.17789 • Published 2 days ago • 10
DyMU: Dynamic Merging and Virtual Unmerging for Efficient VLMs Paper • 2504.17040 • Published 3 days ago • 8
ViSMaP: Unsupervised Hour-long Video Summarisation by Meta-Prompting Paper • 2504.15921 • Published 4 days ago • 4
3DV-TON: Textured 3D-Guided Consistent Video Try-on via Diffusion Models Paper • 2504.17414 • Published 2 days ago • 4
TimeChat-Online: 80% Visual Tokens are Naturally Redundant in Streaming Videos Paper • 2504.17343 • Published 2 days ago • 4
AIMO-2 Winning Solution: Building State-of-the-Art Mathematical Reasoning Models with OpenMathReasoning dataset Paper • 2504.16891 • Published 3 days ago • 13
PHYBench: Holistic Evaluation of Physical Perception and Reasoning in Large Language Models Paper • 2504.16074 • Published 4 days ago • 29
Sailor2: Sailing in South-East Asia with Inclusive Multilingual LLMs Paper • 2502.12982 • Published Feb 18 • 17
VisuLogic: A Benchmark for Evaluating Visual Reasoning in Multi-modal Large Language Models Paper • 2504.15279 • Published 5 days ago • 61
DreamID: High-Fidelity and Fast diffusion-based Face Swapping via Triplet ID Group Learning Paper • 2504.14509 • Published 6 days ago • 43
Pre-DPO: Improving Data Utilization in Direct Preference Optimization Using a Guiding Reference Model Paper • 2504.15843 • Published 4 days ago • 16
I-Con: A Unifying Framework for Representation Learning Paper • 2504.16929 • Published 3 days ago • 26