Efficient LLaMA-3.2-Vision by Trimming Cross-attended Visual Features Paper • 2504.00557 • Published Apr 1 • 15
Audio Flamingo 2: An Audio-Language Model with Long-Audio Understanding and Expert Reasoning Abilities Paper • 2503.03983 • Published Mar 6 • 25
Omni-RGPT: Unifying Image and Video Region-level Understanding via Token Marks Paper • 2501.08326 • Published Jan 14 • 34
VLsI: Verbalized Layers-to-Interactions from Large to Small Vision Language Models Paper • 2412.01822 • Published Dec 2, 2024 • 15
Synthio: Augmenting Small-Scale Audio Classification Datasets with Synthetic Data Paper • 2410.02056 • Published Oct 2, 2024 • 6
Synthio: Augmenting Small-Scale Audio Classification Datasets with Synthetic Data Paper • 2410.02056 • Published Oct 2, 2024 • 6
Towards Robust Speech Representation Learning for Thousands of Languages Paper • 2407.00837 • Published Jun 30, 2024 • 11
ML-SUPERB: Multilingual Speech Universal PERformance Benchmark Paper • 2305.10615 • Published May 18, 2023 • 1
Joint Prediction and Denoising for Large-scale Multilingual Self-supervised Learning Paper • 2309.15317 • Published Sep 26, 2023
Reproducing Whisper-Style Training Using an Open-Source Toolkit and Publicly Available Data Paper • 2309.13876 • Published Sep 25, 2023 • 1
Improving Massively Multilingual ASR With Auxiliary CTC Objectives Paper • 2302.12829 • Published Feb 24, 2023
OWSM v3.1: Better and Faster Open Whisper-Style Speech Models based on E-Branchformer Paper • 2401.16658 • Published Jan 30, 2024 • 14
YODAS: Youtube-Oriented Dataset for Audio and Speech Paper • 2406.00899 • Published Jun 2, 2024 • 3
A Survey on Graph Neural Networks for Time Series: Forecasting, Classification, Imputation, and Anomaly Detection Paper • 2307.03759 • Published Jul 7, 2023 • 1
Time-LLM: Time Series Forecasting by Reprogramming Large Language Models Paper • 2310.01728 • Published Oct 3, 2023
WeaverBird: Empowering Financial Decision-Making with Large Language Model, Knowledge Base, and Search Engine Paper • 2308.05361 • Published Aug 10, 2023