SmolVLA: A Vision-Language-Action Model for Affordable and Efficient Robotics Paper • 2506.01844 • Published Jun 2 • 113
SmolDocling: An ultra-compact vision-language model for end-to-end multi-modal document conversion Paper • 2503.11576 • Published Mar 14 • 111
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model Paper • 2502.02737 • Published Feb 4 • 235
GACELA -- A generative adversarial context encoder for long audio inpainting Paper • 2005.05032 • Published May 11, 2020
Adversarial Generation of Time-Frequency Features with application in audio synthesis Paper • 1902.04072 • Published Feb 11, 2019
Building and better understanding vision-language models: insights and future directions Paper • 2408.12637 • Published Aug 22, 2024 • 132