Open LLM Leaderboard

Enterprise

community

https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard

Activity Feed

AI & ML interests

Evaluating open LLMs

Recent Activity

thomwolf authored a paper 16 days ago

FineWeb2: One Pipeline to Scale Them All -- Adapting Pre-Training Data Processing to Every Language

lvwerra authored a paper 16 days ago

FineWeb2: One Pipeline to Scale Them All -- Adapting Pre-Training Data Processing to Every Language

hynky authored a paper 16 days ago

FineWeb2: One Pipeline to Scale Them All -- Adapting Pre-Training Data Processing to Every Language

View all activity

AdinaY

posted an update 1 day ago

Post

1904

Kimi-K2 is now available on the hub🔥🚀
This is a trillion-parameter MoE model focused on long context, code, reasoning, and agentic behavior.

moonshotai/kimi-k2-6871243b990f2af5ba60617d

✨ Base & Instruct
✨ 1T total / 32B active - Modified MIT License
✨ 128K context length
✨ Muon optimizer for stable trillion-scale training

1 reply

albertvillanova

posted an update 2 days ago

Post

195

🚀 New in smolagents v1.20.0: Remote Python Execution via WebAssembly (Wasm)

We've just merged a major new capability into the smolagents framework: the CodeAgent can now execute Python code remotely in a secure, sandboxed WebAssembly environment!

🔧 Powered by Pyodide and Deno, this new WasmExecutor lets your agent-generated Python code run safely: without relying on Docker or local execution.

Why this matters:
✅ Isolated execution = no host access
✅ No need for Python on the user's machine
✅ Safer evaluation of arbitrary code
✅ Compatible with serverless / edge agent workloads
✅ Ideal for constrained or untrusted environments

This is just the beginning: a focused initial implementation with known limitations. A solid MVP designed for secure, sandboxed use cases. 💡

💡 We're inviting the open-source community to help evolve this executor:
• Tackle more advanced Python features
• Expand compatibility
• Add test coverage
• Shape the next-gen secure agent runtime

🔗 Check out the PR: https://github.com/huggingface/smolagents/pull/1261

Let's reimagine what agent-driven Python execution can look like: remote-first, wasm-secure, and community-built.

This feature is live in smolagents v1.20.0!
Try it out.
Break things. Extend it. Give us feedback.
Let's build safer, smarter agents; together 🧠⚙️

👉 https://github.com/huggingface/smolagents/releases/tag/v1.20.0

#smolagents #WebAssembly #Python #AIagents #Pyodide #Deno #OpenSource #HuggingFace #AgenticAI

AdinaY

posted an update 5 days ago

Post

419

The tech report of RoboBrain 2.0 is now available on the Daily Papers page🔥

It's an embedded brain model that sees, thinks, and plans for many robots.

Leave your insights or questions, the authors are happy to respond.
RoboBrain 2.0 Technical Report (2507.02029)

AdinaY

posted an update 5 days ago

Post

293

Skywork-Reward-V2🔥 Reward models by Skywork AI.

Skywork/skywork-reward-v2-685cc86ce5d9c9e4be500c84

✨ 0.6B - 8B
✨ Trained on 26M human-LLM preference pairs
✨ 0.6B > 27B in many tasks

AdinaY

posted an update 5 days ago

Post

242

POLAR🐻‍❄️ New reward modeling by Shanghai AI Lab

internlm/polar-68693f829d2e83ac5e6e124a

✨ 1.8B/7B - Apache 2.0
✨ Scalable policy discriminative pretraining
✨ Easy RLHF with minimal preference data

AdinaY

posted an update 10 days ago

Post

1933

The Chinese Open Source Heatmap is live 🔥
You can now track the companies/ research labs/ communities powering China’s open source AI movement.

zh-ai-community/model-release-heatmap-zh

Some highlights:

✨Giant Tech are investing more in open source.
-Alibaba: Full stack open ecosystem
-Tecent: Hunyuan image/video/3D
-Bytedance: Catching up fast in 2025
-Baidu: New player in open LLM

✨New players emerging post–DeepSeek moment.
-Xiaomi
-Red Note
-Bilibili
-MiniMax
-Moonshot AI

✨Startup list is shifting fast! Those who find a direction aligned with their strengths are the ones who endure.
-DeepSeek
-MiniMax
-StepFun
-Moonshot AI
-Zhipu AI
-OpenBMB

✨Research Lab & Community are making key contributions.
-BAAI
-Shanghai AI Lab
-OpenMOSS
-MAP

AdinaY

posted an update 11 days ago

Post

3314

🔥 June highlights from China’s open source ecosystem.

zh-ai-community/june-2025-open-works-from-the-chinese-community-683d66c188f782dc5570ba15

✨Baidu & MiniMax both launched open foundation models
- Baidu: Ernie 4.5 ( from 0.3B -424B ) 🤯
- MiniMax: MiniMax -M1 ( Hybrid MoE reasoning model )

✨Multimodal AI is moving from fusion to full-stack reasoning: unified Any-to-Any pipelines across text, vision, audio, and 3D
- Baidu: ERNIE-4.5-VL-424B
- Moonshot AI: Kimi-VL-A3B
- Alibaba: Ovis-U1
- BAAI: Video-XL-2/OmniGen2
- AntGroup: Ming-Lite-Omni
- Chinese Academy of Science: Stream-Omni
- Bytedance: SeedVR2-3B
- Tencent: Hunyuan 3D 2.1/ SongGeneration
- FishAudio: Openaudio-s1-mini

✨Domain specific models are rapidly emerging
- Alibaba DAMO: Lingshu-7B (medical MLLM)
- BAAI: RoboBrain (Robotics)

✨ So many small models!
- OpenBMB: MiciCPM4 ( on device )
- Qwen: Embedding/Reranker (0.6B)
- Alibaba: Ovis-U1-3B
- Moonshot AI: Kimi-VL-A3B
- Bytedance: SeedVR2-3B

AdinaY

posted an update 11 days ago

Post

318

MTVCraft 🔥 Veo3 style Audio-Video model by BAAI

Model:
BAAI/MTVCraft
Demo:
BAAI/MTVCraft

✨ Text > [Speech + SFX + BGM] > Synchronized Video
✨ Built with Qwen3 + ElevenLabs + MTV

AdinaY

posted an update 11 days ago

Post

2288

GLM-4.1V-Thinking 🔥 New open vision reasoning model by Zhipu AI

THUDM/glm-41v-thinking-6862bbfc44593a8601c2578d

✨ 9B base & Thinking - MIT license
✨ CoT + RL with Curriculum Sampling
✨ 64k context, 4K image, any aspect ratio
✨ Support English & Chinese
✨ Outperforms GPT 4O -2024/11/20

AdinaY

posted an update 13 days ago

Post

985

Pangu Pro MoE 🔥 Huawei's first open model!

Paper:
Pangu Pro MoE: Mixture of Grouped Experts for Efficient Sparsity (2505.21411)
Model:
https://gitcode.com/ascend-tribe/pangu-pro-moe-model

✨ MoGE: Mixture of Grouped Experts
✨ 16B activated params - 48 layers
✨ Trained on 15T tokens
✨ Natively optimized for Ascend hardware

1 reply

AdinaY

posted an update 13 days ago

Post

336

Baidu kept its promise, releasing 10 open models on the very last day of June🚀 Let's meet ERNIE 4.5 🔥

baidu/ernie-45-6861cd4c9be84540645f35c9

✨ From 0.3B to 424B total params
✨ Includes 47B & 3B active param MoE models + a 0.3B dense model
✨ Apache 2.0
✨ 128K context length
✨ Text+Vision co-training with ViT & UPO

AdinaY

posted an update 16 days ago

Post

3098

Hunyuan-A13B 🔥 New MoE LLM by TencentHunyuan

tencent/Hunyuan-A13B-Instruct

✨80B total / 13B active params
✨256K context window
✨Dual-mode reasoning: fast & slow thinking
✨Efficient inference (GQA + quantization)

thomwolf

authored a paper 16 days ago

FineWeb2: One Pipeline to Scale Them All -- Adapting Pre-Training Data Processing to Every Language

Paper • 2506.20920 • Published 17 days ago • 61

lvwerra

authored a paper 16 days ago

FineWeb2: One Pipeline to Scale Them All -- Adapting Pre-Training Data Processing to Every Language

Paper • 2506.20920 • Published 17 days ago • 61

hynky

authored a paper 16 days ago

FineWeb2: One Pipeline to Scale Them All -- Adapting Pre-Training Data Processing to Every Language

Paper • 2506.20920 • Published 17 days ago • 61

freddyaboulton

posted an update 18 days ago

Post

3361

The new multimodalart/self-forcing model and demo are truly impressive!

AdinaY

posted an update 19 days ago

Post

1620

LongWriter-Zero 🔥 A Purely RL trained LLM handles 10K+ token coherent passages by Tsinghua University

Model:
THU-KEG/LongWriter-Zero-32B
Dataset:
THU-KEG/LongWriter-Zero-RLData
Paper:
LongWriter-Zero: Mastering Ultra-Long Text Generation via Reinforcement Learning (2506.18841)

✨ 32B
✨ Multi-reward GRPO: length, fluency, structure, non-redundancy
✨ Enforces <think><answer> format via Format RM
✨ Build on Qwen2.5-32B-base

AdinaY

posted an update 19 days ago

Post

305

MOSS-TTSD 🔊 Bilingual text-to-spoken dialogue model by Fudan University - Open MOSS team.

Model:
fnlp/MOSS-TTSD-v0
Demo:
fnlp/MOSS-TTSD

✨ Supports Chinese & English
✨ Zero-shot 2-speaker voice cloning
✨ Long-form generation (up to 960s)
✨ Built on Qwen 3

albertvillanova

posted an update 19 days ago

Post

1578

🚀 SmolAgents v1.19.0 is live!
This release brings major improvements to agent flexibility, UI usability, streaming architecture, and developer experience: making it easier than ever to build smart, interactive AI agents. Here's what's new:

🔧 Agent Upgrades
- Support for managed agents in ToolCallingAgent
- Context manager support for cleaner agent lifecycle handling
- Output formatting now uses XML tags for consistency

🖥️ UI Enhancements
- GradioUI now supports reset_agent_memory: perfect for fresh starts in dev & demos.

🔄 Streaming Refactor
- Streaming event aggregation moved off the Model class
- ➡️ Better architecture & maintainability

📦 Output Tracking
- CodeAgent outputs are now stored in ActionStep
- ✅ More visibility and structure to agent decisions

🐛 Bug Fixes
- Smarter planning logic
- Cleaner Docker logs
- Better prompt formatting for additional_args
- Safer internal functions and final answer matching

📚 Docs Improvements
- Added quickstart examples with tool usage
- One-click Colab launch buttons
- Expanded reference docs (AgentMemory, GradioUI docstrings)
- Fixed broken links and migrated to .md format

🔗 Full release notes:
https://github.com/huggingface/smolagents/releases/tag/v1.19.0

💬 Try it out, explore the new features, and let us know what you build!

#smolagents #opensource #AIagents #LLM #HuggingFace

AdinaY

posted an update 20 days ago

Post

270

Skywork-SWE 🔥 New code agent model by Skywork 天工

Skywork/Skywork-SWE-32B

✨ 32B - Apache 2.0
✨ 38.0% pass@1 on SWE-bench Verified
✨ Up to 47.0% with test-time scaling
✨ Shows clear data scaling law (8K+ demos)
✨ Built on Qwen2.5-Coder-32B + OpenHands

AI & ML interests

Recent Activity

Team members 18

open-llm-leaderboard's activity