yamayou
's Collections
Beyond A*: Better Planning with Transformers via Search Dynamics
Bootstrapping
Paper
•
2402.14083
•
Published
•
49
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
Paper
•
2402.17764
•
Published
•
615
Genie: Generative Interactive Environments
Paper
•
2402.15391
•
Published
•
72
Humanoid Locomotion as Next Token Prediction
Paper
•
2402.19469
•
Published
•
29
ViTAR: Vision Transformer with Any Resolution
Paper
•
2403.18361
•
Published
•
56
Simulating Classroom Education with LLM-Empowered Agents
Paper
•
2406.19226
•
Published
•
32
MIRAI: Evaluating LLM Agents for Event Forecasting
Paper
•
2407.01231
•
Published
•
18
Prithvi WxC: Foundation Model for Weather and Climate
Paper
•
2409.13598
•
Published
•
43
Selective Attention Improves Transformer
Paper
•
2410.02703
•
Published
•
24
ShowUI: One Vision-Language-Action Model for GUI Visual Agent
Paper
•
2411.17465
•
Published
•
87
Chimera: Improving Generalist Model with Domain-Specific Experts
Paper
•
2412.05983
•
Published
•
9
Multimodal Latent Language Modeling with Next-Token Diffusion
Paper
•
2412.08635
•
Published
•
45
Large Action Models: From Inception to Implementation
Paper
•
2412.10047
•
Published
•
35
Byte Latent Transformer: Patches Scale Better Than Tokens
Paper
•
2412.09871
•
Published
•
101
AnySat: An Earth Observation Model for Any Resolutions, Scales, and
Modalities
Paper
•
2412.14123
•
Published
•
11
Cosmos World Foundation Model Platform for Physical AI
Paper
•
2501.03575
•
Published
•
78
Towards System 2 Reasoning in LLMs: Learning How to Think With Meta
Chain-of-Though
Paper
•
2501.04682
•
Published
•
98
DINO-WM: World Models on Pre-trained Visual Features enable Zero-shot
Planning
Paper
•
2411.04983
•
Published
•
13
Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth
Approach
Paper
•
2502.05171
•
Published
•
140
VideoRoPE: What Makes for Good Video Rotary Position Embedding?
Paper
•
2502.05173
•
Published
•
65
Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse
Attention
Paper
•
2502.11089
•
Published
•
155
LLM-Microscope: Uncovering the Hidden Role of Punctuation in Context
Memory of Transformers
Paper
•
2502.15007
•
Published
•
175
R2-T2: Re-Routing in Test-Time for Multimodal Mixture-of-Experts
Paper
•
2502.20395
•
Published
•
47
RWKV-7 "Goose" with Expressive Dynamic State Evolution
Paper
•
2503.14456
•
Published
•
143
Cosmos-Reason1: From Physical Common Sense To Embodied Reasoning
Paper
•
2503.15558
•
Published
•
46
Advances and Challenges in Foundation Agents: From Brain-Inspired
Intelligence to Evolutionary, Collaborative, and Safe Systems
Paper
•
2504.01990
•
Published
•
259
Paper
•
2504.00927
•
Published
•
46
One-Minute Video Generation with Test-Time Training
Paper
•
2504.05298
•
Published
•
98
MineWorld: a Real-Time and Open-Source Interactive World Model on
Minecraft
Paper
•
2504.08388
•
Published
•
39
SocioVerse: A World Model for Social Simulation Powered by LLM Agents
and A Pool of 10 Million Real-World Users
Paper
•
2504.10157
•
Published
•
16
Adaptive Computation Pruning for the Forgetting Transformer
Paper
•
2504.06949
•
Published
•
3