new

Get trending papers in your email inbox once a day!

Get trending papers in your email inbox!

Daily Papers

byAK and the research community

Mar 12

Submitted by

tellarin

Crowdsource, Crawl, or Generate? Creating SEA-VL, a Multicultural Vision-Language Dataset for Southeast Asia

·
92 authors

1

Submitted by

ColeYzzzz

LMM-R1: Empowering 3B LMMs with Strong Reasoning Abilities Through Two-Stage Rule-Based RL

·
10 authors

Submitted by

a43992899

YuE: Scaling Open Foundation Models for Long-Form Music Generation

·
57 authors

1

Submitted by

Xuerui123

UniF^2ace: Fine-grained Face Understanding and Generation with Unified Multimodal Models

·
8 authors

Submitted by

Owen777

MagicInfinite: Generating Infinite Talking Videos with Your Words and Voice

·
13 authors

1

Submitted by

Z-MU-Z

SegAgent: Exploring Pixel Understanding Capabilities in MLLMs by Imitating Human Annotator Trajectories

·
8 authors

1

Submitted by

wujie10

Seedream 2.0: A Native Chinese-English Bilingual Image Generation Foundation Model

·
28 authors

1

Submitted by

akhaliq

Gemini Embedding: Generalizable Embeddings from Gemini

·
47 authors

Submitted by

hsaest

Implicit Reasoning in Transformers is Reasoning through Shortcuts

·
4 authors

1

Submitted by

Harold328

LightGen: Efficient Image Generation through Knowledge Distillation and Direct Preference Optimization

·
11 authors

Submitted by

subin-kim

Tuning-Free Multi-Event Long Video Generation via Synchronized Coupled Sampling

·
5 authors

Submitted by

CohenQu

Optimizing Test-Time Compute via Meta Reinforcement Fine-Tuning

·
7 authors

1

Submitted by

LegendBC

OmniMamba: Efficient and Unified Multimodal Understanding and Generation via State Space Models

·
5 authors

Submitted by

Jianxiong

CineBrain: A Large-Scale Multi-Modal Brain Dataset During Naturalistic Audiovisual Narrative Processing

·
5 authors

1

Submitted by

jmhb

Video Action Differencing

·
8 authors

1

Submitted by

MaverickAlex

^RFLAV: Rolling Flow matching for infinite Audio Video generation

·
7 authors

1

Submitted by

xwen99

"Principal Components" Enable A New Language of Images

·
5 authors

1

Submitted by

KID-22

Perplexity Trap: PLM-Based Retrievers Overrate Low Perplexity Documents

·
9 authors

Submitted by

XinXuNLPer

BiasEdit: Debiasing Stereotyped Language Models via Model Editing

·
4 authors

1

Submitted by

RohamKoohestani

Benchmarking AI Models in Software Engineering: A Review, Search Tool, and Enhancement Protocol

·
3 authors

1

Submitted by

Jinfa

QuoTA: Query-oriented Token Assignment via CoT Query Decouple for Long Video Comprehension

·
11 authors

Submitted by

XiaXin-Aloys

RayFlow: Instance-Aware Diffusion Acceleration via Adaptive Flow Trajectories

·
6 authors

Submitted by

jingtao

Evaluating Intelligence via Trial and Error

·
10 authors

2

Submitted by

Mountchicken

Referring to Any Person

·
8 authors

Submitted by

kwanY

AnyMoLe: Any Character Motion In-betweening Leveraging Video Diffusion Models

·
4 authors

1

Submitted by

Tvaranka

NullFace: Training-Free Localized Face Anonymization

·
4 authors

Submitted by

shangjingbo

AI-native Memory 2.0: Second Me

·
5 authors

Submitted by

luoyingfeng

Beyond Decoder-only: Large Language Models Can be Good Encoders for Machine Translation

·
11 authors

1

Submitted by

WYLing

VisualSimpleQA: A Benchmark for Decoupled Evaluation of Large Vision-Language Models in Fact-Seeking Question Answering

·
10 authors

1

Submitted by

adamdad

Mixture of Experts Made Intrinsically Interpretable

·
7 authors