view article Article Vision Language Models (Better, Faster, Stronger) By merve and 4 others • 5 days ago • 288
view article Article The Transformers Library: standardizing model definitions By lysandre and 3 others • 1 day ago • 68
view article Article Blazingly fast whisper transcriptions with Inference Endpoints By mfuntowicz and 5 others • 4 days ago • 56
view article Article Finally, a Replacement for BERT: Introducing ModernBERT By bclavie and 14 others • Dec 19, 2024 • 629
Kokoro TTS Collection Kokoro is an open-weight TTS model with 82 million parameters. Despite its lightweight architecture, it delivers amazing quality. • 4 items • Updated Feb 28 • 6
Parakeet Collection NeMo Parakeet ASR Models attain strong speech recognition accuracy while being efficient for inference. Available in CTC and RNN-Transducer variants. • 11 items • Updated 4 days ago • 24
3DV-TON: Textured 3D-Guided Consistent Video Try-on via Diffusion Models Paper • 2504.17414 • Published 22 days ago • 17
70% Size, 100% Accuracy: Lossless LLM Compression for Efficient GPU Inference via Dynamic-Length Float Paper • 2504.11651 • Published about 1 month ago • 28
Describe Anything: Detailed Localized Image and Video Captioning Paper • 2504.16072 • Published 24 days ago • 61
Describe Anything Collection Multimodal Large Language Models for Detailed Localized Image and Video Captioning • 7 items • Updated 7 days ago • 49