view article Article π€ππ¬π₯οΈπ Kimi-VL-A3B-Thinking-2506: A Quick Navigation By moonshotai and 1 other β’ Jun 21 β’ 66
view article Article SmolVLM2: Bringing Video Understanding to Every Device By orrzohar and 6 others β’ Feb 20 β’ 291
StyleSSP: Sampling StartPoint Enhancement for Training-free Diffusion-based Method for Style Transfer Paper β’ 2501.11319 β’ Published Jan 20 β’ 1
view article Article DeepSearch Using Visual RAG in Agentic Frameworks π By paultltc and 1 other β’ Mar 21 β’ 35
SmolDocling: An ultra-compact vision-language model for end-to-end multi-modal document conversion Paper β’ 2503.11576 β’ Published Mar 14 β’ 114
view article Article Open-Source Handwritten Signature Detection Model By samuellimabraz β’ Mar 14 β’ 116
Animate-X: Universal Character Image Animation with Enhanced Motion Representation Paper β’ 2410.10306 β’ Published Oct 14, 2024 β’ 57
Loopy: Taming Audio-Driven Portrait Avatar with Long-Term Motion Dependency Paper β’ 2409.02634 β’ Published Sep 4, 2024 β’ 98
SLIM Models Collection Structured Language Instruction Models (SLIMs) β’ 31 items β’ Updated Feb 10 β’ 32
Llama 3.2 Collection This collection hosts the transformers and original repos of the Llama 3.2 and Llama Guard 3 β’ 15 items β’ Updated Dec 6, 2024 β’ 628
Granite 3.0 Language Models Collection A series of language models trained by IBM licensed under Apache 2.0 license. We release both the base pretrained and instruct models. β’ 8 items β’ Updated May 2 β’ 98
SLIM GGUF Collection Quantized GGUF 'tool' implementations of SLIM Models β’ 30 items β’ Updated Feb 23 β’ 12
view article Article TTS Arena: Benchmarking Text-to-Speech Models in the Wild By mrfakename and 6 others β’ Feb 27, 2024 β’ 71
Open-source speech datasets annotated using Data-Speech Collection Open-source annotated speech datasets ranging from 1,000 hours to 45,000 hours. β’ 11 items β’ Updated Aug 8, 2024 β’ 5
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits Paper β’ 2402.17764 β’ Published Feb 27, 2024 β’ 624
Gen4Gen: Generative Data Pipeline for Generative Multi-Concept Composition Paper β’ 2402.15504 β’ Published Feb 23, 2024 β’ 23
Industry BERT Models Collection Industry and specialized domain finetuned BERT embedding models β’ 6 items β’ Updated May 28 β’ 8
TURNA: A Turkish Encoder-Decoder Language Model for Enhanced Understanding and Generation Paper β’ 2401.14373 β’ Published Jan 25, 2024 β’ 11
InstantID: Zero-shot Identity-Preserving Generation in Seconds Paper β’ 2401.07519 β’ Published Jan 15, 2024 β’ 58