10 1

Piotr Wilkin

ilintar

AI & ML interests

None yet

Recent Activity

updated a model about 8 hours ago

ilintar/ERNIE-4.5-21B-A3B-PT-gguf

published a model about 23 hours ago

ilintar/ERNIE-4.5-21B-A3B-PT-gguf

new activity about 1 month ago

arcee-ai/Homunculus-GGUF:Extremely sensitive to parameters

View all activity

Organizations

updated a model about 8 hours ago

ilintar/ERNIE-4.5-21B-A3B-PT-gguf

Updated about 8 hours ago • 1

published a model about 23 hours ago

ilintar/ERNIE-4.5-21B-A3B-PT-gguf

Updated about 8 hours ago • 1

New activity in arcee-ai/Homunculus-GGUF about 1 month ago

Extremely sensitive to parameters

#3 opened about 1 month ago by

ilintar

New activity in bartowski/arcee-ai_Homunculus-GGUF about 1 month ago

Some jinja error again...

👍 1

#1 opened about 1 month ago by

ilintar

New activity in bartowski/deepseek-ai_DeepSeek-R1-0528-Qwen3-8B-GGUF about 2 months ago

Difference in Chat templates

#1 opened about 2 months ago by

a-word-in-a-corner

reacted to smirki's post with 🔥 2 months ago

Post

2795

✨ We’re live! Introducing TFrameX, the agentic framework for AI builders.

After nights of development, we’re finally open-sourcing TFrameX, a powerful AI agent communication and coordination library.
TFrameX lets you:

🤖 Run agents in dynamic flows
🔁 Compose reusable patterns like Sequential, Parallel, Router, and more
🧠 Enable agent-to-agent collaboration and delegation
⚡ Build modular, complex multi-agent systems that just work

👉 GitHub: TFrameX
https://github.com/TesslateAI/TFrameX

But we didn’t stop there.

We also built a sleek visual builder to design, deploy, and debug your agent patterns without writing boilerplate!

🧩 Visual Studio for TFrameX: https://github.com/TesslateAI/Studio

If you’re building agent frameworks, LLM tools, or agentic apps, TFrameX gives you the tools to move fast and reason deeply.

New activity in unsloth/Qwen3-30B-A3B-GGUF 2 months ago

The chat template is still broken

#13 opened 2 months ago by

ilintar

reacted to wolfram's post with 🔥 2 months ago

Post

7340

Finally finished my extensive **Qwen 3 evaluations** across a range of formats and quantisations, focusing on **MMLU-Pro** (Computer Science).

A few take-aways stood out - especially for those interested in local deployment and performance trade-offs:

1️⃣ **Qwen3-235B-A22B** (via Fireworks API) tops the table at **83.66%** with ~55 tok/s.
2️⃣ But the **30B-A3B Unsloth** quant delivered **82.20%** while running locally at ~45 tok/s and with zero API spend.
3️⃣ The same Unsloth build is ~5x faster than Qwen's **Qwen3-32B**, which scores **82.20%** as well yet crawls at <10 tok/s.
4️⃣ On Apple silicon, the **30B MLX** port hits **79.51%** while sustaining ~64 tok/s - arguably today's best speed/quality trade-off for Mac setups.
5️⃣ The **0.6B** micro-model races above 180 tok/s but tops out at **37.56%** - that's why it's not even on the graph (50 % performance cut-off).

All local runs were done with LM Studio on an M4 MacBook Pro, using Qwen's official recommended settings.

**Conclusion:** Quantised 30B models now get you ~98 % of frontier-class accuracy - at a fraction of the latency, cost, and energy. For most local RAG or agent workloads, they're not just good enough - they're the new default.

Well done, Qwen - you really whipped the llama's ass! And to OpenAI: for your upcoming open model, please make it MoE, with toggleable reasoning, and release it in many sizes. *This* is the future!

4 replies

updated a model 2 months ago

ilintar/Apriel-Nemotron-15b-Thinker-iGGUF

15B • Updated May 7 • 14

published a model 2 months ago

ilintar/Apriel-Nemotron-15b-Thinker-iGGUF

15B • Updated May 7 • 14

New activity in bartowski/Qwen_Qwen3-32B-GGUF 2 months ago

How can I use IQ3_M?

#4 opened 2 months ago by

danglduy

upvoted a collection 2 months ago

Qwen3

Collection

72 items • Updated 30 days ago • 851

New activity in bartowski/THUDM_GLM-4-32B-0414-GGUF 3 months ago

llama.cpp fixes have just been merged

🤗 2

#5 opened 3 months ago by

Mushoz

New activity in lmstudio-community/GLM-Z1-Rumination-32B-0414-GGUF 3 months ago

Prompt Template for use with model

👍 1

#2 opened 3 months ago by

solotrek

updated a model 3 months ago

ilintar/THUDM-GLM-4-32B-0414-IQ2_S.GGUF

33B • Updated Apr 22 • 6 • 1

published a model 3 months ago

ilintar/THUDM-GLM-4-32B-0414-IQ2_S.GGUF

33B • Updated Apr 22 • 6 • 1

updated 2 models 3 months ago

ilintar/THUDM_GLM-Z1-9B-0414_iGGUF

9B • Updated Apr 22 • 63 • 3

ilintar/SWE-Dev-7B-iGGUF

8B • Updated Apr 22 • 7 • 1

published a model 3 months ago

ilintar/SWE-Dev-7B-iGGUF

8B • Updated Apr 22 • 7 • 1

reacted to bartowski's post with 👍 3 months ago

Post

38578

Access requests enabled for latest GLM models

While a fix is being implemented (https://github.com/ggml-org/llama.cpp/pull/12957) I want to leave the models up for visibility and continued discussion, but want to prevent accidental downloads of known broken models (even though there are settings that could fix it at runtime for now)

With this goal, I've enabled access requests. I don't really want your data, so I'm sorry that I don't think there's a way around that? But that's what I'm gonna do for now, and I'll remove the gate when a fix is up and verified and I have a chance to re-convert and quantize!

Hope you don't mind in the mean time :D

1 reply

Piotr Wilkin

AI & ML interests

Recent Activity

Organizations

ilintar's activity

Extremely sensitive to parameters

Some jinja error again...

Difference in Chat templates

The chat template is still broken

How can I use IQ3_M?

llama.cpp fixes have just been merged

Prompt Template for use with model