1 2 3

dan su

sudanenator

AI & ML interests

None yet

Recent Activity

reacted to mrs83's post with 😎 9 days ago

🚀 Just released a PoC: Kurtis-E1 MLX Voice Agent An offline, privacy-first voice assistant built for macOS (Apple Silicon), designed for empathetic, short-form interactions. 🧠 Powered by: - Whisper (via MLX) for speech-to-text: https://pypi.org/project/mlx-whisper/ - Kurtis-E1 (a custom SmolLM2 LLM) via Ollama - Coqui-TTS XTTSv2 for multilingual TTS - Optional translation layer via TowerInstruct-13B-v0.1 for non-English voice input/output: https://huggingface.co/Unbabel/TowerInstruct-13B-v0.1 🎧 Everything runs entirely on-device (Mac Mini M4 Max - 24gb) — no cloud, no remote API calls, no data leakage. 💡 Code is fully handcrafted (no AI-generated code), and designed to showcase what’s possible with local models, even on laptops. 🛠️ Open to contributions, ideas (e.g., LM Studio for MLX inference, MLX worker subprocess, optimize for latency and VRAM usage). 👉 Video demo (Italian): https://www.youtube.com/watch?v=8-1PcmUStaI PoC: https://github.com/ethicalabs-ai/Kurtis-E1-MLX-Voice-Agent Kurtis-E1: https://huggingface.co/collections/ethicalabs/kurtis-e1-67a9148e0836885c44c7902c Kurtis-E1 WebGPU: https://huggingface.co/spaces/ethicalabs/Kurtis-E1-WebGPU

reacted to AdinaY's post with 😎 about 1 month ago

reacted to MonsterMMORPG's post with 🔥 about 1 month ago

Wan 2.1 AI Video Model: Ultimate Step-by-Step Tutorial for Windows & Affordable Private Cloud Setup : https://youtu.be/hnAhveNy-8s https://youtu.be/hnAhveNy-8s Please check all screenshots to see latest news and updates after the tutorial video Alibaba’s new Wan 2.1 text-to-video, video-to-video and image-to-video Open Source AI is unbelievable. In this tutorial I will show how you can install Wan 2.1 all publicly published models into your Windows PC with 1-click installation and use them with the easiest possible way. With the Gradio APP I have developed, you will be able to use Wan AI with as low as 3.5GB VRAM having GPUs. Furthermore, for those who want to utilize powerful private cloud GPUs with cheapest possible prices, I will show how to 1-click and install Wan 2.1 on Massed Compute and on RunPod. Additionally, I will compare performance of RTX 3090 TI with RTX 5090 on all Wan 2.1 models. You will be shocked to see performance of RTX 5090. Also the APP I developed supports all RTX 5000 series on Windows with Python VENV natively. You don't need Linux or WSL. 🔗 Full Instructions, Configs, Installers, Information and Links Shared Post (the one used in the tutorial) ⤵️ ▶️ https://www.patreon.com/posts/click-to-open-post-used-in-tutorial-123105403 🔗 SECourses Official Discord 9500+ Members ⤵️ ▶️ https://discord.com/servers/software-engineering-courses-secourses-772774097734074388 🔗 Stable Diffusion, FLUX, Generative AI Tutorials and Resources GitHub ⤵️ ▶️ https://github.com/FurkanGozukara/Stable-Diffusion 🔗 SECourses Official Reddit - Stay Subscribed To Learn All The News and More ⤵️ ▶️ https://www.reddit.com/r/SECourses/ 🔗 MSI RTX 5090 TRIO FurMark Benchmarking + Overclocking + Noise Testing and Comparing with RTX 3090 TI ⤵️ ▶️ https://youtu.be/uV3oqdILOmA 🔗 RTX 5090 Tested Against FLUX DEV, SD 3.5 Large, SD 3.5 Medium, SDXL, SD 1.5, AMD 9950X + RTX 3090 TI ⤵️ ▶️ https://youtu.be/jHlGzaDLkto

View all activity

Organizations

sudanenator's activity

reacted to mrs83's post with 😎 9 days ago

Post

3193

🚀 Just released a PoC: Kurtis-E1 MLX Voice Agent

An offline, privacy-first voice assistant built for macOS (Apple Silicon), designed for empathetic, short-form interactions.

🧠 Powered by:
- Whisper (via MLX) for speech-to-text: https://pypi.org/project/mlx-whisper/
- Kurtis-E1 (a custom SmolLM2 LLM) via Ollama
- Coqui-TTS XTTSv2 for multilingual TTS
- Optional translation layer via TowerInstruct-13B-v0.1 for non-English voice input/output: Unbabel/TowerInstruct-13B-v0.1

🎧 Everything runs entirely on-device (Mac Mini M4 Max - 24gb) — no cloud, no remote API calls, no data leakage.
💡 Code is fully handcrafted (no AI-generated code), and designed to showcase what’s possible with local models, even on laptops.
🛠️ Open to contributions, ideas (e.g., LM Studio for MLX inference, MLX worker subprocess, optimize for latency and VRAM usage).

👉 Video demo (Italian): https://www.youtube.com/watch?v=8-1PcmUStaI

PoC: https://github.com/ethicalabs-ai/Kurtis-E1-MLX-Voice-Agent
Kurtis-E1: ethicalabs/kurtis-e1-67a9148e0836885c44c7902c
Kurtis-E1 WebGPU: ethicalabs/Kurtis-E1-WebGPU

2 replies

reacted to AdinaY's post with 😎 about 1 month ago

Post

4020

Exciting releases from the Chinese community this February🔥
👉 https://huggingface.co/collections/zh-ai-community/2025-february-67a35aaa68e97812def5b6ef

MLLM:
✨ Ovis2 by Alibaba
AIDC-AI/ovis2-67ab36c7e497429034874464
✨ Step Audio Chat by StepFun AI
stepfun-ai/step-audio-67b33accf45735bb21131b0b

Audio:
✨ Step Audio TTS by StepFunAI
stepfun-ai/Step-Audio-TTS-3B
✨ InspireMusic by Alibaba

FunAudioLLM
✨ Baichuan Audio by BaichuanAI
baichuan-inc/Baichuan-Audio-Instruct

Video:
✨ Wan2.1 by Alibaba_Wan
Wan-AI/Wan2.1-T2V-14B
✨ Stepvideo-T2V by StepFun AI
stepfun-ai/stepvideo-t2v
✨ SkyReels-V1 by Skywork
Skywork/skyreels-v1-67b34676ff65b4ec02d16307
✨ LLaDA-8B by RenminUniversity
GSAI-ML/LLaDA-8B-Instruct

MoE:
✨ Moonlight-16B by MoonshotAI (Kimi)
moonshotai/Moonlight-16B-A3B-Instruct

Reasoning:
✨ TinyR1-32B by Qihoo360
qihoo360/TinyR1-32B-Preview

Dataset:
✨ Chinese DeepSeek R1-Distill data -110k
Congliu/Chinese-DeepSeek-R1-Distill-data-110k

reacted to MonsterMMORPG's post with 🔥 about 1 month ago

Post

3384

Wan 2.1 AI Video Model: Ultimate Step-by-Step Tutorial for Windows & Affordable Private Cloud Setup : https://youtu.be/hnAhveNy-8s

https://youtu.be/hnAhveNy-8s

Please check all screenshots to see latest news and updates after the tutorial video

Alibaba’s new Wan 2.1 text-to-video, video-to-video and image-to-video Open Source AI is unbelievable. In this tutorial I will show how you can install Wan 2.1 all publicly published models into your Windows PC with 1-click installation and use them with the easiest possible way. With the Gradio APP I have developed, you will be able to use Wan AI with as low as 3.5GB VRAM having GPUs. Furthermore, for those who want to utilize powerful private cloud GPUs with cheapest possible prices, I will show how to 1-click and install Wan 2.1 on Massed Compute and on RunPod. Additionally, I will compare performance of RTX 3090 TI with RTX 5090 on all Wan 2.1 models. You will be shocked to see performance of RTX 5090. Also the APP I developed supports all RTX 5000 series on Windows with Python VENV natively. You don't need Linux or WSL.

🔗 Full Instructions, Configs, Installers, Information and Links Shared Post (the one used in the tutorial) ⤵️
▶️ https://www.patreon.com/posts/click-to-open-post-used-in-tutorial-123105403

🔗 SECourses Official Discord 9500+ Members ⤵️
▶️ https://discord.com/servers/software-engineering-courses-secourses-772774097734074388

🔗 Stable Diffusion, FLUX, Generative AI Tutorials and Resources GitHub ⤵️
▶️ https://github.com/FurkanGozukara/Stable-Diffusion

🔗 SECourses Official Reddit - Stay Subscribed To Learn All The News and More ⤵️
▶️ https://www.reddit.com/r/SECourses/

🔗 MSI RTX 5090 TRIO FurMark Benchmarking + Overclocking + Noise Testing and Comparing with RTX 3090 TI ⤵️
▶️ https://youtu.be/uV3oqdILOmA

🔗 RTX 5090 Tested Against FLUX DEV, SD 3.5 Large, SD 3.5 Medium, SDXL, SD 1.5, AMD 9950X + RTX 3090 TI ⤵️
▶️ https://youtu.be/jHlGzaDLkto

1 reply

reacted to Kseniase's post with ➕👍 about 1 month ago

Post

9656

8 Free Sources about AI Agents:

Agents seem to be everywhere and this collection is for a deep dive into the theory and practice:

1. "Agents" Google's whitepaper by Julia Wiesinger, Patrick Marlow and Vladimir Vuskovic -> https://www.kaggle.com/whitepaper-agents
Covers agents, their functions, tool use and how they differ from models

2. "Agents in the Long Game of AI. Computational Cognitive Modeling for Trustworthy, Hybrid AI" book by Marjorie McShane, Sergei Nirenburg, and Jesse English -> https://direct.mit.edu/books/oa-monograph/5833/Agents-in-the-Long-Game-of-AIComputational
Explores building AI agents, using Hybrid AI, that combines ML with knowledge-based reasoning

3. "AI Engineer Summit 2025: Agent Engineering" 8-hour video -> https://www.youtube.com/watch?v=D7BzTxVVMuw
Experts' talks that share insights on the freshest Agent Engineering advancements, such as Google Deep Research, scaling tips and more

4. AI Agents Course from Hugging Face -> https://huggingface.co/learn/agents-course/en/unit0/introduction
Agents' theory and practice to learn how to build them using top libraries and tools

5. "Artificial Intelligence: Foundations of Computational Agents", 3rd Edition, book by David L. Poole and Alan K. Mackworth -> https://artint.info/3e/html/ArtInt3e.html
Agents' architectures, how they learn, reason, plan and act with certainty and uncertainty

6. "Intelligent Agents: Theory and Practice" book by Michael Wooldridge -> https://www.cs.ox.ac.uk/people/michael.wooldridge/pubs/ker95/ker95-html.html
A fascinating option to dive into how agents were seen in 1995 and explore their theory, architectures and agent languages

7. The Turing Post articles "AI Agents and Agentic Workflows" on Hugging Face -> @Kseniase
We explore agentic workflows in detail and agents' building blocks, such as memory and knowledge

8. Our collection "8 Free Sources to Master Building AI Agents" -> https://www.turingpost.com/p/building-ai-agents-sources

4 replies

reacted to mkurman's post with 👍 about 2 months ago

Post

2043

I've been working on something cool: a GRPO with an LLM evaluator that can also perform SFT on the feedback data - if you want. Check it out 😊

Any 🌟are more than welcome 🤗

https://github.com/mkurman/grpo-llm-evaluator

reacted to CultriX's post with ❤️ about 2 months ago

Post

2493

Final upgrade to the Multi-Agent Task Completion Space: CultriX/MultiAgent-CodeTask .

It now includes :
- a live stream of the progress being made on the task (see included video),
- The following components:
1. Automatic prompt optimization
2. An orchestrator deciding which agent to call dynamically including feedback from a human (human-in-the-loop)
3. A coding agent to complete the task
4. A code reviewing agent to iteratively provide feedback to improve the code generated by the coding agent until the code meets the required criteria after which it is approved.
5. A testing agent that tests the approved code or provides information on how to test it.
6. A documentation agent that provides documentation and a help message for the approved and tested code.

reacted to davidberenstein1957's post with 🤗 about 2 months ago

Post

2883

Anyone can create free hosted tools for their AI agents! 🔥

Agentic RAG stack part 2 - augment
Augment retrieval results by reranking optimises content without increasing time too much

part2: https://huggingface.co/blog/davidberenstein1957/ai-blueprint-agentic-rag-part-2-augment
code: https://github.com/huggingface/ai-blueprint

reacted to nyuuzyou's post with 👍 about 2 months ago

Post

2469

📱 UI Navigation Corpus - teleren/ui-navigation-corpus

A comprehensive collection of mobile and web UI elements created by a new member of the Hugging Face community @teleren . I'm glad that I was able to provide a little help together with @its5Q to get this dataset published.

This dataset contains:
- Screenshots and recordings of mobile (iOS/Android) and web interfaces
- UI navigation annotations and metadata
- Screen categorization tags and text extractions
- Navigation paths and screen relationships
- Version control for UI imagery

Perfect for training UI navigation agents and understanding interface patterns. The dataset provides detailed annotations linking screens, sections, and navigation flows together.

reacted to chansung's post with 👍 2 months ago

Post

2067

Simple Summarization on DeepSeek-R1 from DeepSeek AI

The RL stage is very important.
↳ However, it is difficult to create a truly helpful AI for people solely through RL.
↳ So, we applied a learning pipeline consisting of four stages: providing a good starting point, reasoning RL, SFT, and safety RL, and achieved performance comparable to o1.
↳ Simply fine-tuning other open models with the data generated by R1-Zero (distillation) resulted in performance comparable to o1-mini.

Of course, this is just a brief overview and may not be of much help. All models are accessible on Hugging Face, and the paper can be read through the GitHub repository.

Model:

deepseek-ai
Paper: https://github.com/deepseek-ai/DeepSeek-R1

1 reply

reacted to danielhanchen's post with 🔥 3 months ago

Post

3192

Deepseek V3, including GGUF + bf16 versions are now uploaded!

Includes 2, 3, 4, 5, 6 and 8-bit quantized versions.

GGUFs: unsloth/DeepSeek-V3-GGUF
bf16: unsloth/DeepSeek-V3-bf16

Min. hardware requirements to run: 48GB RAM + 250GB of disk space for 2-bit.

See how to run them with examples and the full collection: unsloth/deepseek-v3-all-versions-677cf5cfd7df8b7815fc723c

reacted to reddgr's post with 👀 3 months ago

Post

2363

Major update on the Talking to Chatbots dataset! Expanded the 'wrapped' dataset (one row per chat) to 2.86k records, and the 'unwrapped' version (one row per conversation turn) to 11k records. The main source is my ChatGPT archive with nearly 2 years of chats. It is still a work in progress as I incorporate chats from other sources and qualitative metrics (SCBN) for responses.

reddgr/talking-to-chatbots-unwrapped-chats

reddgr/talking-to-chatbots-chats

reacted to Xenova's post with 👍 8 months ago

Post

8091

Introducing Whisper Diarization: Multilingual speech recognition with word-level timestamps and speaker segmentation, running 100% locally in your browser thanks to 🤗 Transformers.js!

Tested on this iconic Letterman interview w/ Grace Hopper from 1983!
- Demo: Xenova/whisper-speaker-diarization
- Source code: Xenova/whisper-speaker-diarization

1 reply

reacted to chansung's post with ❤️ 12 months ago

Post

4413

💻 Smoothing the Transition from Service LLM to Local LLM

Imagine your go-to LLM service is down, or you need to use it offline – yikes! This project is all about having that "Plan B" ready to go. Here's LLaMA Duo I've been building with @sayakpaul :

✨ Fine-tune a smaller LLM: We used Hugging Face's alignment-handbook to teach a smaller-sized LLM to mimic my favorite large language model. Think of it as that super-smart AI assistant getting a capable understudy.

🤖 Batch Inference: Let's get that fine-tuned LLM working! My scripts generate lots of text like a champ, and we've made sure things run smoothly even with bigger workloads.

🧐 Evaluation: How well is my small LLM doing? We integrated with the Gemini API to use it as an expert judge – it compares my model's work to the original. Talk about a tough critic!

🪄 Synthetic Data Generation: Need to boost that model's performance? Using Gemini's feedback, we can create even more training data, custom-made to make the LLM better.

🧱 Building Blocks: This isn't just a one-time thing – it's a toolkit for all kinds of LLMOps work. Want to change your evaluation metrics? Bring in models trained differently? Absolutely, let's make it happen.

Why this project is awesome:

💪 Reliability: Keep things running no matter what happens to your main LLM source.
🔒 Privacy: Process sensitive information on your own terms.
🗺️ Offline capable: No internet connection? No problem!
🕰️ Version Control: Lock in your favorite LLM's behavior, even if the service model changes.

We'm excited to share the code on GitHub. Curious to see what you all think! 👉🏻 https://github.com/deep-diver/llamaduo