new

Get trending papers in your email inbox once a day!

Get trending papers in your email inbox!

Daily Papers

byAK and the research community

Jul 8

Submitted by

UglyToilet

MemOS: A Memory OS for AI System

·
39 authors

Submitted by

Nicolas-BZRD

Should We Still Pretrain Encoders with Masked Language Modeling?

·
8 authors

Submitted by

JunhaoZhuang

4DSloMo: 4D Reconstruction for High Speed Scene with Asynchronous Capture

·
7 authors

Submitted by

RunpeiDong

DreamVLA: A Vision-Language-Action Model Dreamed with Comprehensive World Knowledge

·
13 authors

Submitted by

RowitZou

Pre-Trained Policy Discriminators are General Reward Models

·
22 authors

Submitted by

KYLN24

BMMR: A Large-Scale Bilingual Multimodal Multi-Discipline Reasoning Dataset

·
15 authors

Submitted by

AdinaY

RoboBrain 2.0 Technical Report

·
46 authors

Submitted by

hiyouga

Easy Dataset: A Unified and Extensible Framework for Synthesizing LLM Fine-Tuning Data from Unstructured Documents

·
7 authors

Submitted by

Bibaolong

RefineX: Learning to Refine Pre-training Data at Scale from Expert-Guided Programs

·
10 authors

Submitted by

AkiCumulo

StreamDiT: Real-Time Streaming Text-to-Video Generation

·
5 authors

1

Submitted by

ZZXF

Reviving Cultural Heritage: A Novel Approach for Comprehensive Historical Document Restoration

·
8 authors

Submitted by

xxzcc

ArtifactsBench: Bridging the Visual-Interactive Gap in LLM Code Generation Evaluation

·
32 authors

Submitted by

justinyyy

OmniDraft: A Cross-vocabulary, Online Adaptive Drafter for On-device Speculative Decoding

·
7 authors

1

Submitted by

Gigglingface

On the rankability of visual embeddings

·
3 authors

Submitted by

ziyjiang

VLM2Vec-V2: Advancing Multimodal Embedding for Videos, Images, and Visual Documents

·
13 authors

Submitted by

SteveZeyuZhang

PresentAgent: Multimodal Agent for Presentation Video Generation

·
7 authors

Submitted by

cedricbonhomme

VLAI: A RoBERTa-Based Model for Automated Vulnerability Severity Classification

·
2 authors

Submitted by

danielchyeh

Beyond Simple Edits: X-Planner for Complex Instruction-Based Image Editing

·
7 authors

1

Submitted by

jannalu

Evaluating LLMs on Real-World Forecasting Against Human Superforecasters

·
1 authors

Submitted by

amanchadha

MOD-X: A Modular Open Decentralized eXchange Framework proposal for Heterogeneous Interoperable Artificial Agents

·
5 authors

1

Submitted by

ashutosh1919

Disambiguation-Centric Finetuning Makes Enterprise Tool-Calling LLMs More Realistic and Less Risky

·
3 authors

1

Submitted by

yuanze1024

SeqTex: Generate Mesh Textures in Video Sequence

·
7 authors

1