Stalin16
's Collections
Data and other things
updated
MegaPairs: Massive Data Synthesis For Universal Multimodal Retrieval
Paper
•
2412.14475
•
Published
•
55
How to Synthesize Text Data without Model Collapse?
Paper
•
2412.14689
•
Published
•
53
Token-Budget-Aware LLM Reasoning
Paper
•
2412.18547
•
Published
•
47
WavePulse: Real-time Content Analytics of Radio Livestreams
Paper
•
2412.17998
•
Published
•
11
Bridging the Data Provenance Gap Across Text, Speech and Video
Paper
•
2412.17847
•
Published
•
9
No More Adam: Learning Rate Scaling at Initialization is All You Need
Paper
•
2412.11768
•
Published
•
44
2.5 Years in Class: A Multimodal Textbook for Vision-Language
Pretraining
Paper
•
2501.00958
•
Published
•
107
URSA: Understanding and Verifying Chain-of-thought Reasoning in
Multimodal Mathematics
Paper
•
2501.04686
•
Published
•
54
MLLM-as-a-Judge for Image Safety without Human Labeling
Paper
•
2501.00192
•
Published
•
31
OmniThink: Expanding Knowledge Boundaries in Machine Writing through
Thinking
Paper
•
2501.09751
•
Published
•
49
WILDCHAT-50M: A Deep Dive Into the Role of Synthetic Data in
Post-Training
Paper
•
2501.18511
•
Published
•
20
LIMO: Less is More for Reasoning
Paper
•
2502.03387
•
Published
•
61
Scaling Pre-training to One Hundred Billion Data for Vision Language
Models
Paper
•
2502.07617
•
Published
•
29
QuEST: Stable Training of LLMs with 1-Bit Weights and Activations
Paper
•
2502.05003
•
Published
•
44
TextAtlas5M: A Large-scale Dataset for Dense Text Image Generation
Paper
•
2502.07870
•
Published
•
45
Jailbreaking to Jailbreak
Paper
•
2502.09638
•
Published
•
4
Scaling Text-Rich Image Understanding via Code-Guided Synthetic
Multimodal Data Generation
Paper
•
2502.14846
•
Published
•
13
Paper
•
2503.08507
•
Published
•
7
"Principal Components" Enable A New Language of Images
Paper
•
2503.08685
•
Published
•
12
YuE: Scaling Open Foundation Models for Long-Form Music Generation
Paper
•
2503.08638
•
Published
•
63
Crowdsource, Crawl, or Generate? Creating SEA-VL, a Multicultural
Vision-Language Dataset for Southeast Asia
Paper
•
2503.07920
•
Published
•
97
Any2Caption:Interpreting Any Condition to Caption for Controllable Video
Generation
Paper
•
2503.24379
•
Published
•
75
Chapter-Llama: Efficient Chaptering in Hour-Long Videos with LLMs
Paper
•
2504.00072
•
Published
•
7
Advances and Challenges in Foundation Agents: From Brain-Inspired
Intelligence to Evolutionary, Collaborative, and Safe Systems
Paper
•
2504.01990
•
Published
•
259
URECA: Unique Region Caption Anything
Paper
•
2504.05305
•
Published
•
34
SIFT-50M: A Large-Scale Multilingual Dataset for Speech Instruction
Fine-Tuning
Paper
•
2504.09081
•
Published
•
17