Aria: An Open Multimodal Native Mixture-of-Experts Model Paper • 2410.05993 • Published Oct 8 • 107 • 7
LongVideoBench: A Benchmark for Long-context Interleaved Video-Language Understanding Paper • 2407.15754 • Published Jul 22 • 19 • 4
LongVideoBench: A Benchmark for Long-context Interleaved Video-Language Understanding Paper • 2407.15754 • Published Jul 22 • 19 • 4
Q-Align: Teaching LMMs for Visual Scoring via Discrete Text-Defined Levels Paper • 2312.17090 • Published Dec 28, 2023 • 4 • 3
Q-Align: Teaching LMMs for Visual Scoring via Discrete Text-Defined Levels Paper • 2312.17090 • Published Dec 28, 2023 • 4 • 3
Q-Align: Teaching LMMs for Visual Scoring via Discrete Text-Defined Levels Paper • 2312.17090 • Published Dec 28, 2023 • 4 • 3
Q-Instruct: Improving Low-level Visual Abilities for Multi-modality Foundation Models Paper • 2311.06783 • Published Nov 12, 2023 • 26 • 2