1 7 7

Jiashuo Yu

awojustin

AI & ML interests

Audio-Visual Learning, Music AI, AIGC

Recent Activity

updated a model 29 days ago

OpenGVLab/InternVideo2-Stage2-6B-Audio

upvoted a paper about 1 month ago

VBench++: Comprehensive and Versatile Benchmark Suite for Video Generative Models

View all activity

Organizations

awojustin's activity

updated a model 29 days ago

OpenGVLab/InternVideo2-Stage2-6B-Audio

Updated 29 days ago • 1

upvoted a paper about 1 month ago

VBench++: Comprehensive and Versatile Benchmark Suite for Video Generative Models

Paper • 2411.13503 • Published Nov 20 • 30

authored 8 papers 5 months ago

InternChat: Solving Vision-Centric Tasks by Interacting with Chatbots Beyond Language

Paper • 2305.05662 • Published May 9, 2023 • 4

InternVideo: General Video Foundation Models via Generative and Discriminative Learning

Paper • 2212.03191 • Published Dec 6, 2022

SEINE: Short-to-Long Video Diffusion Model for Generative Transition and Prediction

Paper • 2310.20700 • Published Oct 31, 2023 • 9

OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text

Paper • 2406.08418 • Published Jun 12 • 28

Learning Music-Dance Representations through Explicit-Implicit Rhythm Synchronization

Paper • 2207.03190 • Published Jul 7, 2022

Modality-Aware Contrastive Instance Learning with Self-Distillation for Weakly-Supervised Audio-Visual Violence Detection

Paper • 2207.05500 • Published Jul 12, 2022

MM-Pyramid: Multimodal Pyramid Attentional Network for Audio-Visual Event Localization and Video Parsing

Paper • 2111.12374 • Published Nov 24, 2021

MPN: Multimodal Parallel Network for Audio-Visual Event Localization

Paper • 2104.02971 • Published Apr 7, 2021

upvoted a paper 5 months ago

Scaling Diffusion Transformers to 16 Billion Parameters

Paper • 2407.11633 • Published Jul 16 • 25

liked a dataset 6 months ago

OpenGVLab/InternVideo2_Vid_Text

Viewer • Updated Jul 10 • 40.5M • 56 • 9

upvoted a paper 9 months ago

InternVideo2: Scaling Video Foundation Models for Multimodal Video Understanding

Paper • 2403.15377 • Published Mar 22 • 22

authored a paper 9 months ago

InternVideo2: Scaling Video Foundation Models for Multimodal Video Understanding

Paper • 2403.15377 • Published Mar 22 • 22

liked a Space about 1 year ago

Runtime error

515

📞

Seamless M4T v2

authored a paper about 1 year ago

VBench: Comprehensive Benchmark Suite for Video Generative Models

Paper • 2311.17982 • Published Nov 29, 2023 • 7

upvoted a paper about 1 year ago

MVBench: A Comprehensive Multi-modal Video Understanding Benchmark

Paper • 2311.17005 • Published Nov 28, 2023 • 2

liked a Space about 1 year ago

Running

📊

VBench

liked a model about 1 year ago

Vchitect/LaVie

Text-to-Video • Updated Dec 4, 2023 • 18

upvoted a paper about 1 year ago

VBench: Comprehensive Benchmark Suite for Video Generative Models

Paper • 2311.17982 • Published Nov 29, 2023 • 7