SmolLM2 Collection State-of-the-art compact LLMs for on-device applications: 1.7B, 360M, 135M • 8 items • Updated 5 days ago • 160
Building and better understanding vision-language models: insights and future directions Paper • 2408.12637 • Published Aug 22 • 116
Goldfish: Vision-Language Understanding of Arbitrarily Long Videos Paper • 2407.12679 • Published Jul 17 • 7
Towards Retrieval Augmented Generation over Large Video Libraries Paper • 2406.14938 • Published Jun 21 • 19
Inserting Faces inside Captions: Image Captioning with Attention Guided Merging Paper • 2405.02305 • Published Mar 20 • 2