Piotr Wilkin's picture
8 1

Piotr Wilkin

ilintar

AI & ML interests

None yet

Recent Activity

Organizations

Syndatis Ltd.'s profile picture Hugging Face Discord Community's profile picture

ilintar's activity

reacted to smirki's post with 🔥 17 days ago
view post
Post
2720
✨ We’re live! Introducing TFrameX, the agentic framework for AI builders.

After nights of development, we’re finally open-sourcing TFrameX, a powerful AI agent communication and coordination library.
TFrameX lets you:

🤖 Run agents in dynamic flows
🔁 Compose reusable patterns like Sequential, Parallel, Router, and more
🧠 Enable agent-to-agent collaboration and delegation
⚡ Build modular, complex multi-agent systems that just work

👉 GitHub: TFrameX
https://github.com/TesslateAI/TFrameX

But we didn’t stop there.

We also built a sleek visual builder to design, deploy, and debug your agent patterns without writing boilerplate!

🧩 Visual Studio for TFrameX: https://github.com/TesslateAI/Studio

If you’re building agent frameworks, LLM tools, or agentic apps, TFrameX gives you the tools to move fast and reason deeply.
New activity in unsloth/Qwen3-30B-A3B-GGUF 17 days ago
reacted to wolfram's post with 🔥 21 days ago
view post
Post
7146
Finally finished my extensive **Qwen 3 evaluations** across a range of formats and quantisations, focusing on **MMLU-Pro** (Computer Science).

A few take-aways stood out - especially for those interested in local deployment and performance trade-offs:

1️⃣ **Qwen3-235B-A22B** (via Fireworks API) tops the table at **83.66%** with ~55 tok/s.
2️⃣ But the **30B-A3B Unsloth** quant delivered **82.20%** while running locally at ~45 tok/s and with zero API spend.
3️⃣ The same Unsloth build is ~5x faster than Qwen's **Qwen3-32B**, which scores **82.20%** as well yet crawls at <10 tok/s.
4️⃣ On Apple silicon, the **30B MLX** port hits **79.51%** while sustaining ~64 tok/s - arguably today's best speed/quality trade-off for Mac setups.
5️⃣ The **0.6B** micro-model races above 180 tok/s but tops out at **37.56%** - that's why it's not even on the graph (50 % performance cut-off).

All local runs were done with LM Studio on an M4 MacBook Pro, using Qwen's official recommended settings.

**Conclusion:** Quantised 30B models now get you ~98 % of frontier-class accuracy - at a fraction of the latency, cost, and energy. For most local RAG or agent workloads, they're not just good enough - they're the new default.

Well done, Qwen - you really whipped the llama's ass! And to OpenAI: for your upcoming open model, please make it MoE, with toggleable reasoning, and release it in many sizes. *This* is the future!
·
New activity in bartowski/Qwen_Qwen3-32B-GGUF 23 days ago

How can I use IQ3_M?

3
#4 opened 26 days ago by
danglduy
New activity in bartowski/THUDM_GLM-4-32B-0414-GGUF about 1 month ago
reacted to bartowski's post with 👍 about 1 month ago
view post
Post
29670
Access requests enabled for latest GLM models

While a fix is being implemented (https://github.com/ggml-org/llama.cpp/pull/12957) I want to leave the models up for visibility and continued discussion, but want to prevent accidental downloads of known broken models (even though there are settings that could fix it at runtime for now)

With this goal, I've enabled access requests. I don't really want your data, so I'm sorry that I don't think there's a way around that? But that's what I'm gonna do for now, and I'll remove the gate when a fix is up and verified and I have a chance to re-convert and quantize!

Hope you don't mind in the mean time :D
  • 1 reply
·
New activity in bartowski/THUDM_GLM-Z1-9B-0414-GGUF about 1 month ago

Failed to load model

7
#1 opened about 2 months ago by
win10
New activity in Rainnighttram/Dream-7B-bnb-4bit about 2 months ago

Error when trying chat demo

5
#1 opened about 2 months ago by
ilintar