Victor Garnier

vico44

AI & ML interests

None yet

Recent Activity

reacted to fdaudens's post with ❤️ 13 days ago

Well, it took just 2 hours for https://huggingface.co/openai/gpt-oss-120b to hit #1 on Hugging Face. Don’t remember seeing anything rise that fast!

reacted to mitkox's post with 🚀 16 days ago

We’ve reached a point where on device AI coding that is free, offline, and capable isn’t just a theoretical possibility; it’s sitting on my lap, barely warming my thighs. My local MacBook Air setup includes a Qwen3 Coder Flash with a 1M context, Cline in a VSCode IDE. No internet, no cloud, no ID verification- this is the forbidden tech. Current stats: All agentic tools work great local, sandboxed, and MCP OK model output precision 17 tokens/sec. Not great, not terrible 65K tokens context, the model can do 1M, but let’s be real, my MacBook Air would probably achieve fusion before hitting that smoothly Standard backend and cache off for the test All inference and function calling happen locally, offline, untethered. The cloud didn’t even get a memo.

reacted to mitkox's post with 🔥 19 days ago

I got 370 tokens/sec of Qwen3-30B-A3B 2507 on my desktop Z8 GPU workstation. My target is 400 t/s, and the last 10 % always tastes like victory!

View all activity

Organizations

None yet

reacted to fdaudens's post with ❤️ 13 days ago

Post

2592

Well, it took just 2 hours for openai/gpt-oss-120b to hit #1 on Hugging Face. Don’t remember seeing anything rise that fast!

1 reply

reacted to mitkox's post with 🚀 16 days ago

Post

2558

We’ve reached a point where on device AI coding that is free, offline, and capable isn’t just a theoretical possibility; it’s sitting on my lap, barely warming my thighs.
My local MacBook Air setup includes a Qwen3 Coder Flash with a 1M context, Cline in a VSCode IDE. No internet, no cloud, no ID verification- this is the forbidden tech.
Current stats:
All agentic tools work great local, sandboxed, and MCP
OK model output precision
17 tokens/sec. Not great, not terrible
65K tokens context, the model can do 1M, but let’s be real, my MacBook Air would probably achieve fusion before hitting that smoothly
Standard backend and cache off for the test
All inference and function calling happen locally, offline, untethered. The cloud didn’t even get a memo.

3 replies

reacted to mitkox's post with 🔥 19 days ago

Post

2603

I got 370 tokens/sec of Qwen3-30B-A3B 2507 on my desktop Z8 GPU workstation. My target is 400 t/s, and the last 10 % always tastes like victory!

3 replies

liked a model 20 days ago

unsloth/Qwen3-30B-A3B-Instruct-2507-GGUF

31B • Updated 19 days ago • 230k • 203

reacted to mitkox's post with 🚀 25 days ago

Post

2090

I run Qwen3-Coder 480B locally on my Z8, with a 1-million token context window. It’s the equivalent of parallel-parking a Nimitz-class carrier in a kiddie pool. Thanks to whatever dark pact the llama.cpp, CUDA, and kernel folks signed, hybrid inferencing + VRAM↔RAM offload let me stream the model’s synapses across Xeon, RAM, and four lonely A6000s without summoning either the OOM killer or a small house fire.

liked a model about 1 month ago

unsloth/GLM-4-9B-0414-GGUF

Text Generation • 9B • Updated Jul 3 • 6.08k • 16

liked a model 3 months ago

mistralai/Devstral-Small-2505

24B • Updated 1 day ago • 10.7k • 840

Victor Garnier

AI & ML interests

Recent Activity

Organizations

vico44's activity