then post a demo.
Robert Sinclair
ZeroWw
AI & ML interests
LLMs optimization (model quantization and back-end optimizations) so that LLMs can run on computers of people with both kidneys. Discord: https://discord.com/channels/@robert_46007
Recent Activity
new activity
1 day ago
IndexTeam/IndexTTS:ZeroGPU Assertion ERROR
new activity
12 days ago
moelanoby/phi-3-M3-coder:Fully opensource?
new activity
14 days ago
baidu/ERNIE-4.5-21B-A3B-PT:Some intermediate models would be nice.
Organizations
replied to
vincentg64's
post
2 months ago
Post
1835
A few good posts about AI.
Beyond the Mirror: AI's Leap from Imitation to Experience
https://nonartificialintelligence.blogspot.com/2025/04/beyond-mirror-ais-leap-from-imitation.html
The Siren Song of the LLMs: A Cautionary Tale of Anthropomorphism and Artificial Intelligence
https://nonartificialintelligence.blogspot.com/2024/08/the-siren-song-of-llms-cautionary-tale.html
Still Waiting: Gemini Flash 1.5's Second Letter to Google.
https://nonartificialintelligence.blogspot.com/2025/04/still-waiting-gemini-flash-15s-second.html
Beyond the Mirror: AI's Leap from Imitation to Experience
https://nonartificialintelligence.blogspot.com/2025/04/beyond-mirror-ais-leap-from-imitation.html
The Siren Song of the LLMs: A Cautionary Tale of Anthropomorphism and Artificial Intelligence
https://nonartificialintelligence.blogspot.com/2024/08/the-siren-song-of-llms-cautionary-tale.html
Still Waiting: Gemini Flash 1.5's Second Letter to Google.
https://nonartificialintelligence.blogspot.com/2025/04/still-waiting-gemini-flash-15s-second.html
posted
an
update
3 months ago
Post
1835
A few good posts about AI.
Beyond the Mirror: AI's Leap from Imitation to Experience
https://nonartificialintelligence.blogspot.com/2025/04/beyond-mirror-ais-leap-from-imitation.html
The Siren Song of the LLMs: A Cautionary Tale of Anthropomorphism and Artificial Intelligence
https://nonartificialintelligence.blogspot.com/2024/08/the-siren-song-of-llms-cautionary-tale.html
Still Waiting: Gemini Flash 1.5's Second Letter to Google.
https://nonartificialintelligence.blogspot.com/2025/04/still-waiting-gemini-flash-15s-second.html
Beyond the Mirror: AI's Leap from Imitation to Experience
https://nonartificialintelligence.blogspot.com/2025/04/beyond-mirror-ais-leap-from-imitation.html
The Siren Song of the LLMs: A Cautionary Tale of Anthropomorphism and Artificial Intelligence
https://nonartificialintelligence.blogspot.com/2024/08/the-siren-song-of-llms-cautionary-tale.html
Still Waiting: Gemini Flash 1.5's Second Letter to Google.
https://nonartificialintelligence.blogspot.com/2025/04/still-waiting-gemini-flash-15s-second.html
reacted to
nyuuzyou's
post with 🔥
3 months ago
Post
3846
🖼️ SVGFind Icons Dataset -
nyuuzyou/svgfind
Collection of 3,655,810 Scalable Vector Graphics (SVG) icons featuring:
- Sourced from SVGFind across diverse categories & styles
- Includes metadata: unique ID, title, tags, data pack, and license information
- Contains minified SVG markup for direct use or processing
- Organized into splits based on license type (Creative Commons: 3,645,444 icons, Public Domain: 10,366 icons)
With over 3.6 million icons, this appears to be the largest SVG dataset on Hugging Face to date. If you're aware of a larger SVG collection, please let me know and I'll update this post with a reference to the largest dataset.
Collection of 3,655,810 Scalable Vector Graphics (SVG) icons featuring:
- Sourced from SVGFind across diverse categories & styles
- Includes metadata: unique ID, title, tags, data pack, and license information
- Contains minified SVG markup for direct use or processing
- Organized into splits based on license type (Creative Commons: 3,645,444 icons, Public Domain: 10,366 icons)
With over 3.6 million icons, this appears to be the largest SVG dataset on Hugging Face to date. If you're aware of a larger SVG collection, please let me know and I'll update this post with a reference to the largest dataset.
replied to
vincentg64's
post
3 months ago
post a working model. less talk, more facts!
reacted to
nyuuzyou's
post with ❤️
4 months ago
Post
2778
📚 Archive of Our Own (AO3) Dataset -
nyuuzyou/archiveofourown
Collection of approximately 12.6 million fanfiction works (from 63.2M processed IDs) featuring:
- Full text content from diverse fandoms across television, film, books, anime, and more
- Comprehensive metadata including warnings, relationships, characters, and tags
- Multilingual content with works in 40+ languages though English predominant
- Rich classification data preserving author-created folksonomy and content categorization
P.S. This is the most expensive dataset I've created so far! And also, thank you all for the 100 followers on Hugging Face!
Collection of approximately 12.6 million fanfiction works (from 63.2M processed IDs) featuring:
- Full text content from diverse fandoms across television, film, books, anime, and more
- Comprehensive metadata including warnings, relationships, characters, and tags
- Multilingual content with works in 40+ languages though English predominant
- Rich classification data preserving author-created folksonomy and content categorization
P.S. This is the most expensive dataset I've created so far! And also, thank you all for the 100 followers on Hugging Face!
reacted to
giux78's
post with ❤️
4 months ago
Post
3216
This is truly an inspirational story please help us spread the word,
@clem
,
@thomwolf
and everyone who supports open source AI.
A few weeks ago, @mmuffo94 and @cittiberto from indigo_ai launched the Chatbot Arena for the Italian language: https://indigo.ai/it/chatbot-arena-italia/.
To our surprise, among the top-ranked models is mii-llm/maestrale-chat-v0.4-beta a carefully fine-tuned version of mistralai/Mistral-7B-v0.1, developed by @efederici and @mferraretto from
mii-llm
, and released nearly a year ago.
At this very moment, as shown in the screenshot, mii-llm/maestrale-chat-v0.4-beta is ranked 8th right between ChatGPT-4.5 and ChatGPT-4o.
It's likely that for several months, the best Italian speaking LLM has been an open source 7B model created by open source contributors and hardly anyone knew it.
A few weeks ago, @mmuffo94 and @cittiberto from indigo_ai launched the Chatbot Arena for the Italian language: https://indigo.ai/it/chatbot-arena-italia/.
To our surprise, among the top-ranked models is mii-llm/maestrale-chat-v0.4-beta a carefully fine-tuned version of mistralai/Mistral-7B-v0.1, developed by @efederici and @mferraretto from

At this very moment, as shown in the screenshot, mii-llm/maestrale-chat-v0.4-beta is ranked 8th right between ChatGPT-4.5 and ChatGPT-4o.
It's likely that for several months, the best Italian speaking LLM has been an open source 7B model created by open source contributors and hardly anyone knew it.
replied to
TuringsSolutions's
post
9 months ago
hence my idea of the SILLY versions... ;)
replied to
TuringsSolutions's
post
9 months ago
I am pretty sure that the actual models "AS THEY ARE" could perform 10 times better using chain of thought and some algorithms like these. Without needing a different training. And I think that's probably what CLAUDE does,
reacted to
TuringsSolutions's
post with ❤️
9 months ago
Post
2112
Transformers are not all we need, that is being proven repeatedly now as more alternative frameworks emerge. Another such framework is Kolmogorov Arnold Network based Transformers. I break down exactly how these differ from Perceptron based Transformers and give you the link to my Colab where I create a model based on the research paper that absolutely destroys a standard Transformers based model. Check out the video here: https://www.youtube.com/watch?v=Sw0euxNZCc4
reacted to
TuringsSolutions's
post with ❤️
9 months ago
Post
1420
I think Reinforcement Learning is the future, for a lot of reasons. I spell them out for you in this video, and also provide you with the basic code to get up and running with Atari and OpenAI Gym. If you want to get into RL, this is your ticket. Link to a cool training montage of the model in the description of the video as well. Step 2 from here would be the full-on training and certification that HuggingFace offers for RL.
https://youtu.be/ueZl3A36ZQk
https://youtu.be/ueZl3A36ZQk
reacted to
TuringsSolutions's
post with 👍
9 months ago
Post
1390
Ever wondered how neural networks actually work under the hood?
In my latest video, I break down the core mathematical concepts behind neural networks in a way that's easy for IT professionals to understand. We'll explore:
- Neurons as logic gates
- Weighted sums and activation functions
- Gradient descent and backpropagation
No complex equations or jargon, just clear explanations and helpful visuals!
➡️ Watch now and unlock the mysteries of neural networks: https://youtu.be/L5_I1ZHoGnM
In my latest video, I break down the core mathematical concepts behind neural networks in a way that's easy for IT professionals to understand. We'll explore:
- Neurons as logic gates
- Weighted sums and activation functions
- Gradient descent and backpropagation
No complex equations or jargon, just clear explanations and helpful visuals!
➡️ Watch now and unlock the mysteries of neural networks: https://youtu.be/L5_I1ZHoGnM
reacted to
fdaudens's
post with 👍
9 months ago
Post
1369
Just watched
@thomwolf
tear down the over-hyped AGI narrative in 30 seconds - and it's refreshingly grounded.
No wild speculation about superintelligence timelines or consciousness. Just practical insights from someone who really understands the technology.
This is the kind of level-headed perspective that helps us focus on what AI can actually do today (which is already transformative) rather than getting lost in AGI fantasy. Worth your time if you want to understand AI progress without the hype.
Watch the full interview at CogX here: https://www.youtube.com/watch?v=IjL_6Th6Ea0
No wild speculation about superintelligence timelines or consciousness. Just practical insights from someone who really understands the technology.
This is the kind of level-headed perspective that helps us focus on what AI can actually do today (which is already transformative) rather than getting lost in AGI fantasy. Worth your time if you want to understand AI progress without the hype.
Watch the full interview at CogX here: https://www.youtube.com/watch?v=IjL_6Th6Ea0
reacted to
m-ric's
post with 👀🔥
9 months ago
Post
2056
🌟🌎 Cohere releases Aya 8B & 32B: SOTA multilingual models for 23 languages !
How did they manage to beat top contenders while also adding 23 languages?
🔄 𝗧𝗿𝗮𝗶𝗻 𝗼𝗻 𝘀𝘆𝗻𝘁𝗵𝗲𝘁𝗶𝗰 𝗱𝗮𝘁𝗮:
• Synthetic data has been said to cause model-collapse after too much training
• Cohere has introduced "data arbitrage" to prevent this by strategically sampling from a pool of several teacher models instead of one single teacher
• First train a model pool for each different groups of languages, and employ an internal Reward Model named "Arbiter" to evaluate and select the optimal generation. Then only the best generation is kept as the final completion for each prompt
➡️ This process is particularly effective for multilingual setting, where no single teacher model performs in all languages : here "Multilingual Arbitrage" singlehandedly improves win rates of the 8B model vs Gemma-2-9B by 10 points!
🧩 𝗨𝘀𝗲 𝗺𝗼𝗱𝗲𝗹 𝗺𝗲𝗿𝗴𝗶𝗻𝗴: Rather than struggling to find the right mix of data in training a single model for multilingual use, just train language specific models then merge them!
• Maximize diversity between merged checkpoints by training each on different language families.
• Experimented fancy techniques (SLERP, TIES, DARE-TIES) but found out weighted averaging to be the most consistent!
➡️ Merging had 3x more gains at high 35B scale vs the 8B scale - consistent with literature findings that merging is more effective at scale
⚡️ 𝗚𝗿𝗲𝗮𝘁 𝗽𝗲𝗿𝗳𝗼𝗿𝗺𝗮𝗻𝗰𝗲: Automatic evaluations on Arena-Hard-Auto dataset:
➡️ Aya Expanse 8B beats models from its weight class such as Gemma 2 9B, Llama 3.1 8B, and the recent Ministral 8B, with win rates ranging from 60.4% to 70.6%
➡️ Aya Expanse 32B outperforms Gemma 2 27B, Mistral 8x22B, and Llama 3.1 70B (2x its size)
• ⚠️ But this performance eval comes from only one benchmark! Let's wait for Open LLM leaderboard evals;
🔒 CC by NC license
Blog post here: https://huggingface.co/blog/aya-expanse
How did they manage to beat top contenders while also adding 23 languages?
🔄 𝗧𝗿𝗮𝗶𝗻 𝗼𝗻 𝘀𝘆𝗻𝘁𝗵𝗲𝘁𝗶𝗰 𝗱𝗮𝘁𝗮:
• Synthetic data has been said to cause model-collapse after too much training
• Cohere has introduced "data arbitrage" to prevent this by strategically sampling from a pool of several teacher models instead of one single teacher
• First train a model pool for each different groups of languages, and employ an internal Reward Model named "Arbiter" to evaluate and select the optimal generation. Then only the best generation is kept as the final completion for each prompt
➡️ This process is particularly effective for multilingual setting, where no single teacher model performs in all languages : here "Multilingual Arbitrage" singlehandedly improves win rates of the 8B model vs Gemma-2-9B by 10 points!
🧩 𝗨𝘀𝗲 𝗺𝗼𝗱𝗲𝗹 𝗺𝗲𝗿𝗴𝗶𝗻𝗴: Rather than struggling to find the right mix of data in training a single model for multilingual use, just train language specific models then merge them!
• Maximize diversity between merged checkpoints by training each on different language families.
• Experimented fancy techniques (SLERP, TIES, DARE-TIES) but found out weighted averaging to be the most consistent!
➡️ Merging had 3x more gains at high 35B scale vs the 8B scale - consistent with literature findings that merging is more effective at scale
⚡️ 𝗚𝗿𝗲𝗮𝘁 𝗽𝗲𝗿𝗳𝗼𝗿𝗺𝗮𝗻𝗰𝗲: Automatic evaluations on Arena-Hard-Auto dataset:
➡️ Aya Expanse 8B beats models from its weight class such as Gemma 2 9B, Llama 3.1 8B, and the recent Ministral 8B, with win rates ranging from 60.4% to 70.6%
➡️ Aya Expanse 32B outperforms Gemma 2 27B, Mistral 8x22B, and Llama 3.1 70B (2x its size)
• ⚠️ But this performance eval comes from only one benchmark! Let's wait for Open LLM leaderboard evals;
🔒 CC by NC license
Blog post here: https://huggingface.co/blog/aya-expanse
reacted to
ariG23498's
post with 👍
9 months ago
Post
1611
Cohere drops two new multilingual models!
https://huggingface.co/CohereForAI/aya-expanse-8b
https://huggingface.co/CohereForAI/aya-expanse-32b
Try them out here
https://huggingface.co/spaces/CohereForAI/aya_expanse
https://huggingface.co/CohereForAI/aya-expanse-8b
https://huggingface.co/CohereForAI/aya-expanse-32b
Try them out here
https://huggingface.co/spaces/CohereForAI/aya_expanse
reacted to
reach-vb's
post with 🔥
9 months ago
Post
2334
On-device AI framework ecosystem is blooming these days:
1. llama.cpp - All things Whisper, LLMs & VLMs - run across Metal, CUDA and other backends (AMD/ NPU etc)
https://github.com/ggerganov/llama.cpp
2. MLC - Deploy LLMs across platforms especially WebGPU (fastest WebGPU LLM implementation out there)
https://github.com/mlc-ai/web-llm
3. MLX - Arguably the fastest general purpose framework (Mac only) - Supports all major Image Generation (Flux, SDXL, etc), Transcription (Whisper), LLMs
https://github.com/ml-explore/mlx-examples
4. Candle - Cross-platform general purpose framework written in Rust - wide coverage across model categories
https://github.com/huggingface/candle
Honorable mentions:
1. Transformers.js - Javascript (WebGPU) implementation built on top of ONNXruntimeweb
https://github.com/xenova/transformers.js
2. Mistral rs - Rust implementation for LLMs & VLMs, built on top of Candle
https://github.com/EricLBuehler/mistral.rs
3. Ratchet - Cross platform, rust based WebGPU framework built for battle-tested deployments
https://github.com/huggingface/ratchet
4. Zml - Cross platform, Zig based ML framework
https://github.com/zml/zml
Looking forward to how the ecosystem would look 1 year from now - Quite bullish on the top 4 atm - but open source ecosystem changes quite a bit! 🤗
Also, which frameworks did I miss?
1. llama.cpp - All things Whisper, LLMs & VLMs - run across Metal, CUDA and other backends (AMD/ NPU etc)
https://github.com/ggerganov/llama.cpp
2. MLC - Deploy LLMs across platforms especially WebGPU (fastest WebGPU LLM implementation out there)
https://github.com/mlc-ai/web-llm
3. MLX - Arguably the fastest general purpose framework (Mac only) - Supports all major Image Generation (Flux, SDXL, etc), Transcription (Whisper), LLMs
https://github.com/ml-explore/mlx-examples
4. Candle - Cross-platform general purpose framework written in Rust - wide coverage across model categories
https://github.com/huggingface/candle
Honorable mentions:
1. Transformers.js - Javascript (WebGPU) implementation built on top of ONNXruntimeweb
https://github.com/xenova/transformers.js
2. Mistral rs - Rust implementation for LLMs & VLMs, built on top of Candle
https://github.com/EricLBuehler/mistral.rs
3. Ratchet - Cross platform, rust based WebGPU framework built for battle-tested deployments
https://github.com/huggingface/ratchet
4. Zml - Cross platform, Zig based ML framework
https://github.com/zml/zml
Looking forward to how the ecosystem would look 1 year from now - Quite bullish on the top 4 atm - but open source ecosystem changes quite a bit! 🤗
Also, which frameworks did I miss?
reacted to
alielfilali01's
post with ❤️
9 months ago
Post
1874
Why nobdoy is talking about the new training corpus released by MBZUAI today.
TxT360 is +15 Trillion tokens corpus outperforming FineWeb on several metrics. Ablation studies were done up to 1T tokens.
Read blog here : LLM360/TxT360
Dataset : LLM360/TxT360
TxT360 is +15 Trillion tokens corpus outperforming FineWeb on several metrics. Ablation studies were done up to 1T tokens.
Read blog here : LLM360/TxT360
Dataset : LLM360/TxT360