yamayou
's Collections
Beyond A*: Better Planning with Transformers via Search Dynamics
Bootstrapping
Paper
•
2402.14083
•
Published
•
47
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
Paper
•
2402.17764
•
Published
•
609
Genie: Generative Interactive Environments
Paper
•
2402.15391
•
Published
•
71
Humanoid Locomotion as Next Token Prediction
Paper
•
2402.19469
•
Published
•
27
ViTAR: Vision Transformer with Any Resolution
Paper
•
2403.18361
•
Published
•
54
Simulating Classroom Education with LLM-Empowered Agents
Paper
•
2406.19226
•
Published
•
31
MIRAI: Evaluating LLM Agents for Event Forecasting
Paper
•
2407.01231
•
Published
•
17
Prithvi WxC: Foundation Model for Weather and Climate
Paper
•
2409.13598
•
Published
•
42
Selective Attention Improves Transformer
Paper
•
2410.02703
•
Published
•
24
ShowUI: One Vision-Language-Action Model for GUI Visual Agent
Paper
•
2411.17465
•
Published
•
80
Chimera: Improving Generalist Model with Domain-Specific Experts
Paper
•
2412.05983
•
Published
•
9
Multimodal Latent Language Modeling with Next-Token Diffusion
Paper
•
2412.08635
•
Published
•
44
Large Action Models: From Inception to Implementation
Paper
•
2412.10047
•
Published
•
32
Byte Latent Transformer: Patches Scale Better Than Tokens
Paper
•
2412.09871
•
Published
•
92
AnySat: An Earth Observation Model for Any Resolutions, Scales, and
Modalities
Paper
•
2412.14123
•
Published
•
11
Cosmos World Foundation Model Platform for Physical AI
Paper
•
2501.03575
•
Published
•
69
Towards System 2 Reasoning in LLMs: Learning How to Think With Meta
Chain-of-Though
Paper
•
2501.04682
•
Published
•
90
DINO-WM: World Models on Pre-trained Visual Features enable Zero-shot
Planning
Paper
•
2411.04983
•
Published
•
11
Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth
Approach
Paper
•
2502.05171
•
Published
•
113
VideoRoPE: What Makes for Good Video Rotary Position Embedding?
Paper
•
2502.05173
•
Published
•
60