Oliver Wei
Oliver2021
AI & ML interests
None yet
Recent Activity
updated
a collection
2 days ago
Video-gen
liked
a dataset
9 days ago
Team-ACE/ToolACE
liked
a dataset
9 days ago
togethercomputer/glaive-function-calling-v2-formatted
Organizations
None yet
Image-gen
MLLM
LLM understanding
MM-EVAL
-
MMRA: A Benchmark for Multi-granularity Multi-image Relational Association
Paper • 2407.17379 • Published • 3 -
MMSearch: Benchmarking the Potential of Large Models as Multi-modal Search Engines
Paper • 2409.12959 • Published • 38 -
MMMR: Benchmarking Massive Multi-Modal Reasoning Tasks
Paper • 2505.16459 • Published • 45 -
VideoReasonBench: Can MLLMs Perform Vision-Centric Complex Video Reasoning?
Paper • 2505.23359 • Published • 39
MMLM
-
Groma: Localized Visual Tokenization for Grounding Multimodal Large Language Models
Paper • 2404.13013 • Published • 32 -
Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing
Paper • 2404.12253 • Published • 56 -
Data-Efficient Contrastive Language-Image Pretraining: Prioritizing Data Quality over Quantity
Paper • 2403.12267 • Published -
No More Adam: Learning Rate Scaling at Initialization is All You Need
Paper • 2412.11768 • Published • 44
Video-gen
-
Long-Context Autoregressive Video Modeling with Next-Frame Prediction
Paper • 2503.19325 • Published • 73 -
Seedance 1.0: Exploring the Boundaries of Video Generation Models
Paper • 2506.09113 • Published • 90 -
Discrete Diffusion in Large Language and Multimodal Models: A Survey
Paper • 2506.13759 • Published • 41
Agent
Long context
-
InfiniteHiP: Extending Language Model Context Up to 3 Million Tokens on a Single GPU
Paper • 2502.08910 • Published • 149 -
Cramming 1568 Tokens into a Single Vector and Back Again: Exploring the Limits of Embedding Space Capacity
Paper • 2502.13063 • Published • 73 -
Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention
Paper • 2502.11089 • Published • 160 -
LLM Pretraining with Continuous Concepts
Paper • 2502.08524 • Published • 29
RAG
reasoning
-
URSA: Understanding and Verifying Chain-of-thought Reasoning in Multimodal Mathematics
Paper • 2501.04686 • Published • 53 -
Towards Large Reasoning Models: A Survey of Reinforced Reasoning with Large Language Models
Paper • 2501.09686 • Published • 41 -
LLaVA-o1: Let Vision Language Models Reason Step-by-Step
Paper • 2411.10440 • Published • 126 -
TheoremExplainAgent: Towards Multimodal Explanations for LLM Theorem Understanding
Paper • 2502.19400 • Published • 49
VLA
Video-gen
-
Long-Context Autoregressive Video Modeling with Next-Frame Prediction
Paper • 2503.19325 • Published • 73 -
Seedance 1.0: Exploring the Boundaries of Video Generation Models
Paper • 2506.09113 • Published • 90 -
Discrete Diffusion in Large Language and Multimodal Models: A Survey
Paper • 2506.13759 • Published • 41
Image-gen
Agent
MLLM
Long context
-
InfiniteHiP: Extending Language Model Context Up to 3 Million Tokens on a Single GPU
Paper • 2502.08910 • Published • 149 -
Cramming 1568 Tokens into a Single Vector and Back Again: Exploring the Limits of Embedding Space Capacity
Paper • 2502.13063 • Published • 73 -
Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention
Paper • 2502.11089 • Published • 160 -
LLM Pretraining with Continuous Concepts
Paper • 2502.08524 • Published • 29
LLM understanding
RAG
MM-EVAL
-
MMRA: A Benchmark for Multi-granularity Multi-image Relational Association
Paper • 2407.17379 • Published • 3 -
MMSearch: Benchmarking the Potential of Large Models as Multi-modal Search Engines
Paper • 2409.12959 • Published • 38 -
MMMR: Benchmarking Massive Multi-Modal Reasoning Tasks
Paper • 2505.16459 • Published • 45 -
VideoReasonBench: Can MLLMs Perform Vision-Centric Complex Video Reasoning?
Paper • 2505.23359 • Published • 39
reasoning
-
URSA: Understanding and Verifying Chain-of-thought Reasoning in Multimodal Mathematics
Paper • 2501.04686 • Published • 53 -
Towards Large Reasoning Models: A Survey of Reinforced Reasoning with Large Language Models
Paper • 2501.09686 • Published • 41 -
LLaVA-o1: Let Vision Language Models Reason Step-by-Step
Paper • 2411.10440 • Published • 126 -
TheoremExplainAgent: Towards Multimodal Explanations for LLM Theorem Understanding
Paper • 2502.19400 • Published • 49
MMLM
-
Groma: Localized Visual Tokenization for Grounding Multimodal Large Language Models
Paper • 2404.13013 • Published • 32 -
Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing
Paper • 2404.12253 • Published • 56 -
Data-Efficient Contrastive Language-Image Pretraining: Prioritizing Data Quality over Quantity
Paper • 2403.12267 • Published -
No More Adam: Learning Rate Scaling at Initialization is All You Need
Paper • 2412.11768 • Published • 44