AI & ML interests

None defined yet.

burtenshawΒ 
posted an update 2 days ago
view post
Post
1940
The open source AI community is just made of people who are passionate and care about their work. So we thought it would be cool to share our favourite icons of the community with a fun award.

Winners get free Hugging Face Pro Subscriptions, Merchandise, or compute credits for the hub.

πŸ”— Follow and nominate here: community-spotlight

This is a new initiative to recognise and celebrate the incredible work being done by community members. It's all about inspiring more collaboration and innovation in the world of machine learning and AI.

They're highlighting contributors in four key areas:
- model creators: building and sharing innovative and state-of-the-art models.
- educators: sharing knowledge through posts, articles, demos, and events.
- tool builders: creating the libraries, frameworks, and applications that we all use.
- community champions: supporting and mentoring others in forums.

Know someone who deserves recognition? Nominate them by opening a post in the Hugging Face community forum.
  • 1 reply
Β·
eliebakΒ 
posted an update 5 days ago
view post
Post
2845
Super excited to announce that our research team at Hugging Face will be doing an AMA on reddit r/LocalLLaMA.

Come ask any questions to the team behind SmolLM, FineWeb and more! And who knows, maybe there’ll be a shiny new release to talk about?

Thursday 4th September, 8AM-11AM PST πŸ€—

science
eliebakΒ 
posted an update 14 days ago
view post
Post
534
Motif 2.6B tech report is pretty insane, first time i see a model with differential attention and polynorm trained at scale!

> It's trained on 2.5T of token, with a "data mixture schedule" to continuously adjust the mixture over training.
> They use WSD with a "Simple moving average" averaging the last 6 ckpt every 8B token.
> They trained on Finemath, Fineweb2, DCLM, TxT360.
> Lot of details in the finetuning data they used, for instance they used EvolKit and did some "dataset fusion" to have more compressed knowledge into the data.
> They mention they also tried Normalized GPT, QK-Norm and Cross Layer Attention.

Motif-Technologies/Motif-2.6B
anditoΒ 
posted an update about 2 months ago
view post
Post
2840
Many VLMs claim to process hours of video. But can they follow the story?πŸ€”
Today, we introduce TimeScope: The benchmark that separates true temporal understanding from marketing hype. Let's see how much VLMs really understand!⏳

We test three skills that matter for real-world use:
πŸ”Ž Localized Retrieval: Find a specific action.
🧩 Information Synthesis: Piece together scattered clues.
πŸƒ Fine-Grained Perception: Analyze detailed motion (e.g., count how many times a person swings an axe).

The results are in, and they're revealing. Only Gemini 2.5 pro handles 1-hour-long videos.
Performance drops sharply with duration, proving that long video understanding is still challenging. We've found the breaking pointsβ€”now the community can start fixing them.πŸ“ˆ

Want to learn more? TimeScope is 100% open-source. Benchmark your model and help us build the next generation of video AI.

πŸ“– Blog:
https://huggingface.co/blog/timescope-video-lmm-benchmark
πŸ‘©β€πŸ’» Leaderboard & Demo: Apollo-LMMs/TimeScope
πŸ“Š Dataset: Apollo-LMMs/TimeScope
βš™οΈ Eval Code: https://github.com/EvolvingLMMs-Lab/lmms-eval
eliebakΒ 
posted an update about 2 months ago
view post
Post
4701
Kimi K2 tech report is full of gems as always. Here are my notes on it:

> MuonClip: Pretty crazy how after 70k the training stabilizes and the QK-clip is basically inactive. There is also no loss in perf with QK-clip which is not trivial at all (at small scale but with aggressive threshold). Also a cool explanation of why muon makes the logit explode in appendix E (tl;dr is that muon makes the singular value of the update matrix higher)
> Sparsity scaling laws to justify their ratio, they have a very solid training infra that allows the model to be trained at this sparsity level, they could have increased even more but as sparsity increases the training becomes less efficient.
> They diminish the number of attention heads to make it more efficient for long context since attention heads are a big bottleneck for long context. They also remove 2 of the 3 "first dense" layers in the dsv3 arch.

With the sparsity and attention heads (divided by 2) they achieve 83% increased flops compared to deepseek v3 arch at 128k.

> Data: Rephrasing is KEY. They do a lot more synthetic data generation and rephrase their corpus to have different styles, for longer documents they do it by chunk. I'm (half) surprised by the fact that ONLY 1 epoch (assuming same number of training tokens I think?) of data rephrased 10 times has better accuracy than 10 epochs of the same data rephrased once.
> They do rewriting for Math and Knowledge, for Math they apply the ShallowMath recipe and instruct the model to rephrase in a "learning note" style
> They talk about diversity and probably have some internal stuff/eval to test that, as always still a bit unclear for me how to properly measure that.

The infra is also very nice, quick summary:
> PP=16 (1F1B schedule, a bit custom), EP=16, zero1
> No FP8 computation but for storage of specific layers, selective recomputation for inexpensive block, activation offloading to CPU
burtenshawΒ 
posted an update about 2 months ago
view post
Post
1380
Kimi-K2 is ready for general use! In these notebooks I walk you through use cases like function calling and structured outputs.

πŸ”— burtenshaw/Kimi-K2-notebooks

You can swap it into any OpenAI compatible application via Inference Providers and get to work with an open source model.
  • 1 reply
Β·
anditoΒ 
posted an update 2 months ago
view post
Post
3998
πŸ§ πŸ‘οΈ Can AI visualize solutions?

Humans often solve visual problems by sketching ideas in our minds. What if Vision-Language Models (VLMs) could do something similar, not by generating full images, but by using internal β€œmental sketches”?

That’s the idea behind Mirage, a new framework that empowers VLMs to reason using latent visual tokens. Instead of just thinking in words, Mirage mixes in abstract visual representations that help the model solve complex tasks.

These aren't photorealistic images. They're compact, internal representations optimized purely to support reasoning.

πŸ”§ Mirage is trained in two phases:

1) Grounding: It learns to produce latent tokens anchored in real images.
2) Refinement: The model drops the images and learns to generate visual tokens on its own.

πŸ“ˆ And yes, it works!
On challenging benchmarks like Visual Spatial Planning, Jigsaw puzzles, and Spatial Attention Tasks, Mirage clearly outperforms GPT-4o and other strong baselines.
Smart sketches > empty words.

By mimicking the way humans visualize solutions, Mirage gives AI a new kind of imagination, one that’s faster, more efficient, and more human-like.
Kudos to the teams at UMass Amherst and MIT behind this exciting work.
Check the paper: Machine Mental Imagery: Empower Multimodal Reasoning with Latent Visual Tokens (2506.17218)
Β·
burtenshawΒ 
posted an update 2 months ago
view post
Post
2932
Inference for generative ai models looks like a mine field, but there’s a simple protocol for picking the best inference:

🌍 95% of users >> If you’re using open (large) models and need fast online inference, then use Inference providers on auto mode, and let it choose the best provider for the model. https://huggingface.co/docs/inference-providers/index

πŸ‘· fine-tuners/ bespoke >> If you’ve got custom setups, use Inference Endpoints to define a configuration from AWS, Azure, GCP. https://endpoints.huggingface.co/

🦫 Locals >> If you’re trying to stretch everything you can out of a server or local machine, use Llama.cpp, Jan, LMStudio or vLLM. https://huggingface.co/settings/local-apps#local-apps

πŸͺŸ Browsers >> If you need open models running right here in the browser, use transformers.js. https://github.com/huggingface/transformers.js

Let me know what you’re using, and if you think it’s more complex than this.
burtenshawΒ 
posted an update 3 months ago
view post
Post
1022
You don't need remote APIs for a coding copliot, or the MCP Course! Set up a fully local IDE with MCP integration using Continue. In this tutorial Continue guides you through setting it up.

This is what you need to do to take control of your copilot:

1. Get the Continue extension from the [VS Code marketplace](https://marketplace.visualstudio.com/items?itemName=Continue.continue) to serve as the AI coding assistant.

2. Serve the model with an OpenAI compatible server in Llama.cpp / LmStudio/ etc.

llama-server -hf unsloth/Devstral-Small-2505-GGUF:Q4_K_M

3. Create a .continue/models/llama-max.yaml file in your project to tell Continue how to use the local Ollama model.
name: Llama.cpp model
    version: 0.0.1
    schema: v1
    models:
      - provider: llama.cpp
        model: unsloth/Devstral-Small-2505-GGUF
        apiBase: http://localhost:8080
        defaultCompletionOptions:
          contextLength: 8192 
    # Adjust based on the model
        name: Llama.cpp Devstral-Small
        roles:
          - chat
          - edit


4. Create a .continue/mcpServers/playwright-mcp.yaml file to integrate a tool, like the Playwright browser automation tool, with your assistant.

name: Playwright mcpServer
    version: 0.0.1
    schema: v1
    mcpServers:
      - name: Browser search
        command: npx
        args:
          - "@playwright/mcp@latest"


Check out the full tutorial in the [the MCP course](https://huggingface.co/learn/mcp-course/unit2/continue-client)
  • 1 reply
Β·
burtenshawΒ 
posted an update 3 months ago
view post
Post
1687
Brand new MCP Course has units are out, and now it's getting REAL! We've collaborated with Anthropic to dive deep into production ready and autonomous agents using MCP

πŸ”— mcp-course

This is what the new material covers and includes:

- Use Claude Code to build an autonomous PR agent
- Integrate your agent with Slack and Github to integrate it with you Team
- Get certified on your use case and share with the community
- Build an autonomous PR cleanup agent on the Hugging Face hub and deploy it with spaces

The material goes deep into these problems and helps you to build applications that work. We’re super excited to see what you build with it.
burtenshawΒ 
posted an update 3 months ago
view post
Post
1568
Super excited to release Autotrain MCP. This is an MCP server for training AI models, so you can use your AI tools to train your AI models 🀯.

πŸ”— burtenshaw/autotrain-mcp

Use this MCP server with tools like Claude Desktop, Cursor, VSCode, or Continue to do this:

- Define an ML problem like Image Classification, LLM fine-tuning, Text Classification, etc.
- The AI can retrieve models and datasets from the hub using the hub MCP.
- Training happens on a Hugging Face space, so no worries about hardware restraints.
- Models are pushed to the hub to be used inference tools like Llama.cpp, vLLM, MLX, etc.
- Built on top of the AutoTrain library, so it has full integration with transformers and other libraries.

Everything is still under active development, but I’m super excited to hear what people build, and I’m open to contributions!
  • 1 reply
Β·
dvilasueroΒ 
posted an update 3 months ago
view post
Post
2859
Super excited to launch Hugging Face Sheets: Spreadsheets meet AI and unstructured data.

A few months ago, we started imagining new ways to build and transform datasets with the latest open-source models.

Today, I'm thrilled to introduce our first step in this direction.


In a nutshell:

πŸ“ Effortlessly run prompts and models over your data.
🌐 Agentic search for accuracy and real-time information.
πŸ–ΌοΈ Familiar, minimalistic interface for interacting with data.
🎯 Human feedback 2.0: Your input directly improves generated data.
πŸ’― Access hundreds of open models and leading inference providers.

Go to this space to try it out!

aisheets/sheets

Leave your questions below, we're just getting started!
Β·
burtenshawΒ 
posted an update 4 months ago
view post
Post
2677
MCP course is now LIVE! We just dropped quizzes, videos, and live streams to make it a fully interactive course:

πŸ”— join in now: mcp-course

- It’s still free!
- Video 1 walks you through onboarding to the course
- The first live session is next week!
- You can now get a certificate via exam app
- We improved and written material with interactive quizzes

If you’re studying MCP and want a live, interactive, visual, certified course, then join us on the hub!
loubnabnlΒ 
posted an update 4 months ago
burtenshawΒ 
posted an update 4 months ago
view post
Post
3270
We're thrilled to announce the launch of our comprehensive Model Context Protocol (MCP) Course! This free program is designed to take learners from foundational understanding to practical application of MCP in AI.

Follow the course on the hub: mcp-course

In this course, you will:
πŸ“– Study Model Context Protocol in theory, design, and practice.
πŸ§‘β€πŸ’» Learn to use established MCP SDKs and frameworks.
πŸ’Ύ Share your projects and explore applications created by the community.
πŸ† Participate in challenges and evaluate your MCP implementations.
πŸŽ“ Earn a certificate of completion.

At the end of this course, you'll understand how MCP works and how to build your own AI applications that leverage external data and tools using the latest MCP standards.
  • 1 reply
Β·
burtenshawΒ 
posted an update 4 months ago
view post
Post
2449
Qwen 3 Fine tuning >> MoE. Update the experiment thread to include config and script for fine-tuning the Qwen3-30B-A3B model.

The goal is to make a low latency non-thinking model for a daily driver coding, so 3 billion parameters active should be perfect.

βœ”οΈ training running
βœ”οΈ evals running
⏭️ improve dataset

The moe isn't going to fit into colab's A100 even with quantization (πŸ™ @UnslothAI ). So I've been working on HF spaces' H100s for this. Everything is available in the tread and I'll share more tomorrow.

burtenshaw/Qwen3-Code-Lite#1
burtenshawΒ 
posted an update 5 months ago
view post
Post
2677
The rebooted LLM course starts today with an overhauled chapter 1 on Transformers:

πŸ‘‰ Follow the org to join the course: huggingface-course

We’re starting from the foundations of modern generative AI by looking at transformers. This chapter is expanded in depth and features so contains new material like:

FREE and CERTIFIED exam on fundamentals of transformers
deeper exploration of transformer architectures and attention mechanisms
end -to-end exploration of inference strategies for prefill and decode steps

The course has leveled up in complexity and depth, so this a great time to join in if you want to build you own AI models.