Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2503.20215

End-to-End Omni (text, audio, image, video, and natural speech interaction) model based Qwen2.5

Qwen/Qwen2.5-Omni-7B

Any-to-Any • Updated 3 days ago • 61.8k • 1.08k
Running

213

213

Qwen2.5 Omni 7B Demo

🏆

Submit media inputs to generate text and speech responses
Qwen2.5-Omni Technical Report

Paper • 2503.20215 • Published 8 days ago • 112

about 20 hours ago

EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters

Paper • 2402.04252 • Published Feb 6, 2024 • 27
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models

Paper • 2402.03749 • Published Feb 6, 2024 • 13
ScreenAI: A Vision-Language Model for UI and Infographics Understanding

Paper • 2402.04615 • Published Feb 7, 2024 • 43
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss

Paper • 2402.05008 • Published Feb 7, 2024 • 22

about 4 hours ago

DocLLM: A layout-aware generative language model for multimodal document understanding

Paper • 2401.00908 • Published Dec 31, 2023 • 182
COSMO: COntrastive Streamlined MultimOdal Model with Interleaved Pre-Training

Paper • 2401.00849 • Published Jan 1, 2024 • 17
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents

Paper • 2311.05437 • Published Nov 9, 2023 • 50
LLaVA-Interactive: An All-in-One Demo for Image Chat, Segmentation, Generation and Editing

Paper • 2311.00571 • Published Nov 1, 2023 • 41

Sparse Logit Sampling: Accelerating Knowledge Distillation in LLMs

Paper • 2503.16870 • Published 13 days ago • 5
Gemma 3 Technical Report

Paper • 2503.19786 • Published 9 days ago • 39
Qwen2.5-Omni Technical Report

Paper • 2503.20215 • Published 8 days ago • 112
Think Twice: Enhancing LLM Reasoning by Scaling Multi-round Test-time Thinking

Paper • 2503.19855 • Published 8 days ago • 24

Qwen2.5-Omni Technical Report

Paper • 2503.20215 • Published 8 days ago • 112

Reinforcement Learning: An Overview

Paper • 2412.05265 • Published Dec 6, 2024 • 5
Fish-Speech: Leveraging Large Language Models for Advanced Multilingual Text-to-Speech Synthesis

Paper • 2411.01156 • Published Nov 2, 2024 • 6
VBench-2.0: Advancing Video Generation Benchmark Suite for Intrinsic Faithfulness

Paper • 2503.21755 • Published 6 days ago • 30
Qwen2.5-Omni Technical Report

Paper • 2503.20215 • Published 8 days ago • 112

All about agents including models, datasets, evals

Survey on Evaluation of LLM-based Agents

Paper • 2503.16416 • Published 13 days ago • 80
Qwen2.5-Omni Technical Report

Paper • 2503.20215 • Published 8 days ago • 112

VLM RL Reasoning

OpenVLThinker: An Early Exploration to Complex Vision-Language Reasoning via Iterative Self-Improvement

Paper • 2503.17352 • Published 12 days ago • 21
When Less is Enough: Adaptive Token Reduction for Efficient Image Representation

Paper • 2503.16660 • Published 13 days ago • 70
CoMP: Continual Multimodal Pre-training for Vision Foundation Models

Paper • 2503.18931 • Published 9 days ago • 29
MDocAgent: A Multi-Modal Multi-Agent Framework for Document Understanding

Paper • 2503.13964 • Published 16 days ago • 16

R1-Onevision: Advancing Generalized Multimodal Reasoning through Cross-Modal Formalization

Paper • 2503.10615 • Published 20 days ago • 16
UniGoal: Towards Universal Zero-shot Goal-oriented Navigation

Paper • 2503.10630 • Published 20 days ago • 6
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning

Paper • 2503.09516 • Published 21 days ago • 27
LMM-R1: Empowering 3B LMMs with Strong Reasoning Abilities Through Two-Stage Rule-Based RL

Paper • 2503.07536 • Published 23 days ago • 83

Research Papers

Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters

Paper • 2408.03314 • Published Aug 6, 2024 • 59
TAG: A Decentralized Framework for Multi-Agent Hierarchical Reinforcement Learning

Paper • 2502.15425 • Published Feb 21 • 9
EgoLife: Towards Egocentric Life Assistant

Paper • 2503.03803 • Published 28 days ago • 38
Visual-RFT: Visual Reinforcement Fine-Tuning

Paper • 2503.01785 • Published about 1 month ago • 74

Previous
1
2
Next

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs