π― Introduction A leaderboard that visualizes the vibrant HuggingFace community activity through heatmaps.
β¨ Key Features π Real-time Tracking - Model/dataset/app releases from AI labs and developers π Auto Ranking - Rankings based on activity over the past year π¨ Responsive UI - Unique colors per organization, mobile optimized β‘ Auto Updates - Hourly data refresh for latest information
π Major Participants Big Tech: OpenAI, Google, Meta, Microsoft, Apple, NVIDIA AI Startups: Anthropic, Mistral, Stability AI, Cohere, DeepSeek Chinese Companies: Tencent, Baidu, ByteDance, Qwen HuggingFace Official: HuggingFaceH4, HuggingFaceM4, lerobot, etc. Active Developers: prithivMLmods, lllyasviel, multimodalart and many more
π Value Trend Analysis π Real-time open source contribution insights Inspiration πͺ Learn from other developers' activity patterns Ecosystem Growth π± Visualize AI community development
If you want to use ComfyUI or SwarmUI with ComfyUI backend on RunPod cloud platform, this is the ultimate tutorial that you will find to step by step install ComfyUI and SwarmUI on RunPod and use each one of them. RunPod is a great platform to scale your AI generation or if you are a GPU poor, rent the very best GPUs and leverage the AI in your profession. ComfyUI is the ultimate ecosystem right now for Image and Video generation models and with SwarmUI interface leveraging ComfyUI, you can become master for gen AI. So learn how to install ComfyUI on RunPod step by step and run it. Then learn how to install SwarmUI on RunPod step by step and learn how to use it. Then learn how to give installed ComfyUI backend to SwarmUI and leverage its features and ultimate performance and optimizations. Moreover, the installers I made installs Torch 2.7, CUDA 12.8, xFormers, Sage Attention, Flash Attention, Accelerate, Triton, DeepSpeed, ComfyUI manager and moΔ±re.
Qwen2.5-Omni is soooo good that people build multimodal reasoning models off of it π₯Ή > KE-Team/Ke-Omni-R-3B is open-source audio reasoning model sota on average of benchmarks, based on Qwen/Qwen2.5-Omni-3B π£οΈ > Haoz0206/Omni-R1 is a video reasoning model with pixel level grounding (see below) and it's super competitive β―οΈ based on Qwen/Qwen2.5-Omni-7B
reacted to Kseniase's
post with π₯π28 days ago
Letβs refresh some fundamentals today to stay fluent in the what we all work with. Here are some of the most popular model types that shape the vast world of AI (with examples in the brackets):
1. LLM - Large Language Model (GPT, LLaMA) -> Large Language Models: A Survey (2402.06196) + history of LLMs: https://www.turingpost.com/t/The%20History%20of%20LLMs It's trained on massive text datasets to understand and generate human language. They are mostly build on Transformer architecture, predicting the next token. LLMs scale by increasing overall parameter count across all components (layers, attention heads, MLPs, etc.) 2. SLM - Small Language Model (TinyLLaMA, Phi models, SmolLM) A Survey of Small Language Models (2410.20011) Lightweight LM optimized for efficiency, low memory use, fast inference, and edge use. SLMs work using the same principles as LLMs
3. VLM - Vision-Language Model (CLIP, Flamingo) -> An Introduction to Vision-Language Modeling (2405.17247) Processes and understands both images and text. VLMs map images and text into a shared embedding space or generate captions/descriptions from both
4. MLLM - Multimodal Large Language Model (Gemini) -> A Survey on Multimodal Large Language Models (2306.13549) A large-scale model that can understand and process multiple types of data (modalities) β usually text + other formats, like images, videos, audio, structured data, 3D or spatial inputs. MLLMs can be LLMs extended with modality adapters or trained jointly across vision, text, audio, etc.
5. LAM - Large Action Model (InstructDiffusion, RT-2) -> Large Action Models: From Inception to Implementation (2412.10047) Understands and generates action sequences by predicting action tokens (discrete/continuous instructions) that guide agents. Trained on behavior datasets, LAMs generalize across tasks, environments, and modalities - video, sensor data, etc.
Read about LRM, MoE, SSM, RNN, CNN, SAM and LNN belowπ