Kenneth Hamilton PRO
AI & ML interests
Recent Activity
Organizations
ZennyKenny's activity


> LLM fine-tuning and applications
> advanced RAG apps
> Agentic AI projects
> MCP and A2A (new)
GitHub: https://github.com/aishwaryanr/awesome-generative-ai-guide/blob/main/resources/60_ai_projects.md

@mfuntowicz cooked with 8x faster Whisper speech recognition - whisper-large-v3-turbo transcribes at 100x real time on a $0.80/hr L4 GPU!
How they did it: https://huggingface.co/blog/fast-whisper-endpoints
1-click deploy with HF Inference Endpoints: https://endpoints.huggingface.co/new?repository=openai%2Fwhisper-large-v3-turbo&vendor=aws®ion=us-east&accelerator=gpu&instance_id=aws-us-east-1-nvidia-l4-x1&task=automatic-speech-recognition&no_suggested_compute=true

They released Seed1.5-VL, A vision-language model for general-purpose multimodal reasoning.
Itโs not open-source, but the paper and demo are available here๐
โจ Seed1.5-VL Technical Report (2505.07062)
โจ ByteDance-Seed/Seed1.5-VL

Whoa. Reliable open-sourced crawling software is a big win. I'll take it for a spin but I'm optimistic as this is the kind of thing I (and every other AI builder) has been building for years to avoid paying FireCrawl.

onekq-ai/WebApp1K-models-leaderboard

A few take-aways stood out - especially for those interested in local deployment and performance trade-offs:
1๏ธโฃ **Qwen3-235B-A22B** (via Fireworks API) tops the table at **83.66%** with ~55 tok/s.
2๏ธโฃ But the **30B-A3B Unsloth** quant delivered **82.20%** while running locally at ~45 tok/s and with zero API spend.
3๏ธโฃ The same Unsloth build is ~5x faster than Qwen's **Qwen3-32B**, which scores **82.20%** as well yet crawls at <10 tok/s.
4๏ธโฃ On Apple silicon, the **30B MLX** port hits **79.51%** while sustaining ~64 tok/s - arguably today's best speed/quality trade-off for Mac setups.
5๏ธโฃ The **0.6B** micro-model races above 180 tok/s but tops out at **37.56%** - that's why it's not even on the graph (50 % performance cut-off).
All local runs were done with LM Studio on an M4 MacBook Pro, using Qwen's official recommended settings.
**Conclusion:** Quantised 30B models now get you ~98 % of frontier-class accuracy - at a fraction of the latency, cost, and energy. For most local RAG or agent workloads, they're not just good enough - they're the new default.
Well done, Qwen - you really whipped the llama's ass! And to OpenAI: for your upcoming open model, please make it MoE, with toggleable reasoning, and release it in many sizes. *This* is the future!

It's the last day to submit your datasets for the Reasoning Datasets Competition: https://www.bespokelabs.ai/blog/reasoning-datasets-competition
Here are my submissions:
- ZennyKenny/synthetic_vc_financial_decisions_reasoning_dataset
- ZennyKenny/cosa-benchmark-dataset
- ZennyKenny/tactical-military-reasoning-v.1.0
- ZennyKenny/tron-dataset-v.1.0
Have a look and drop a โค๏ธ or comment! Check out the entire collection of submissions here: https://huggingface.co/datasets?other=reasoning-datasets-competition
The Reasoned Capital synthetic dataset suddenly feels much more topical: ZennyKenny/synthetic_vc_financial_decisions_reasoning_dataset ๐ฅ๐ฅ๐ฅ
Really looking forward to potentially expanding this architecture and seeing how algorithmic clever investment truly is! ๐ฐ๐ฐ๐ฐ

The Reasoned Capital synthetic dataset suddenly feels much more topical: ZennyKenny/synthetic_vc_financial_decisions_reasoning_dataset ๐ฅ๐ฅ๐ฅ
Really looking forward to potentially expanding this architecture and seeing how algorithmic clever investment truly is! ๐ฐ๐ฐ๐ฐ

onekq-ai/WebApp1K-models-leaderboard

Fair enough.

With the rise of Vibe Coding, and the potential risks that are introduced by humans letting LLMs build their apps for them, lots of people are (rightfully) concerned about the safety of the code that is hitting prod.
In response to that, I'm happy to present my final submission to the Reasoning Dataset Competition and attempt to start benchmarking the ability of LLMs to identify unsafe and / or exploitable code by way of the CoSa (Code Safety) benchmark: ZennyKenny/cosa-benchmark-dataset
Currently a curated set of 200 examples, calibrated on OpenAI's standard issue models (GPT-4.1, o4 mini, and GPT-3.5 Turbo) as "baseline performance" (70% decile). Check it out and drop a โค๏ธ if you think it could be useful or hit the Community section with suggestions / critiques.
Just beginning how to learn AI in general and looking for an assistant to help you set up a website? Check out one of the mainline providers like OpenAI, Mistral, DeepSeek, etc. You're going to be able to make plain English requests and have it generate some code for you.
In parallel, enroll in some HF courses (https://huggingface.co/learn) to start mastering the concepts needed to work with less guided models.
Probably worth mentioning that there is a lot more to setting up a website than just the code needed, and there are a ton of services out there (Wordpress, WebFlow, SquareSpace, etc.) that can help you from end (buying and registering a domain) to end (making it look like and do the things you want).
I'd politely suggest using the AI to walk you through that entire process, rather than generating the code you need. And for that you can probably configure your own purpose built Assistant using Hugging Chat: https://huggingface.co/chat/


Check if there's one in your city here: LeRobot-worldwide-hackathon/worldwide-map
Kind of surprised to see Microsoft moving into the reasoning space.

If you don't already know, Gradio is an open-source Python library used to build interfaces for machine learning models. Beyond just creating UIs, Gradio also exposes API capabilities and now, Gradio apps can be launched Model Context Protocol (MCP) servers for LLMs.
If you already know how to use Gradio, there are only two additional things you need to do:
* Add standard docstrings to your function (these will be used to generate the descriptions for your tools for the LLM)
* Set
mcp_server=True
in launch()
Here's a complete example (make sure you already have the latest version of Gradio installed):
import gradio as gr
def letter_counter(word, letter):
"""Count the occurrences of a specific letter in a word.
Args:
word: The word or phrase to analyze
letter: The letter to count occurrences of
Returns:
The number of times the letter appears in the word
"""
return word.lower().count(letter.lower())
demo = gr.Interface(
fn=letter_counter,
inputs=["text", "text"],
outputs="number",
title="Letter Counter",
description="Count how many times a letter appears in a word"
)
demo.launch(mcp_server=True)
This is a very simple example, but you can add the ability to generate Ghibli images or speak emotions to any LLM that supports MCP. Once you have an MCP running locally, you can copy-paste the same app to host it on [Hugging Face Spaces](https://huggingface.co/spaces/) as well.
All free and open-source of course! Full tutorial: https://www.gradio.app/guides/building-mcp-server-with-gradio

Over the past few months, I explored how to build intelligent, autonomous agents using cutting-edge tools like smolagents, LlamaIndex, and LangGraph. The course covered everything from the fundamentals of agents to advanced topics like fine-tuning for function-calling, observability, evaluation, and even agents in games.
Some key content included:
1. Introduction to AI Agents
2. Agentic RAG use cases
3. Multi-framework implementation: smolagents, LlamaIndex, and LangGraph
4. Building, testing, and certifying a complete agent project
This was a hands-on, practical experience that deepened my understanding of how to design reliable, tool-using LLM agents. Looking forward to leveraging these skills in real-world applications in healthcare, logistics, and beyond.
Many thanks to the Hugging Face team for putting this together.
Letโs build safe and useful agents!

Great take. Thanks for sharing.