Activity Feed

AI & ML interests

None defined yet.

Recent Activity

wassemgtk 
posted an update 2 days ago
view post
Post
95
Here is the updated note and benchmark table for your review.

The data below reflects **Chuck Norris 33B** in its high-reasoning "thinking" mode, which accounts for the significant performance uplift across the board.

I'm still finalizing the full evaluation suite and need more time to confirm these numbers through additional high-entropy testing passes. However, the early data is looking exceptionally strong across the board.

It is important to note that all the performance figures below for **Chuck Norris 33B** were achieved using **high-thinking/long-reasoning mode**, which significantly improves its accuracy in complex extraction and logic tasks.
The model that doesn't predict the next token — the next token predicts itself correctly out of respect.
wassemgtk 
posted an update 3 days ago
view post
Post
135
Releasing Chuck Norris LLM — full SFT fine-tune with chain-of-thought reasoning.

Trained on +100k examples across math, logic, and code. Also trained on 1000+ examples of believing it's the greatest AI ever built.

Its training loss went to zero. The loss function was too afraid to report anything else.

wassemgtk/chuck-norris-llm
pavankumarbalijepalli 
posted an update 15 days ago
view post
Post
180
The quadratic bottleneck of long-context LLMs just hit a massive speed wall.

Processing long-context sequences in LLMs is computationally expensive due to the quadratic complexity of self-attention. Existing sparse attention methods often rely on sorting or cumulative summation (Top-k/Top-p), which are slow and struggle to prune the "long-tail" of irrelevant tokens.

- FlashPrefill achieves a 27.78× speedup on 256K sequences by replacing heavy sorting with a Max-based Dynamic Thresholding mechanism.
- It introduces "Instantaneous Pattern Discovery" using block-level approximations, bypassing the need for expensive, full-attention score calculations.
- Unlike previous methods that struggle with shorter contexts, it maintains a 1.71× speedup even at 4K, proving its robustness across all scales.
- The framework is fully compatible with existing LLM/VLM architectures and integrates seamlessly into vLLM for real-world deployment.

This breakthrough significantly reduces Time-to-First-Token (TTFT) for long-context applications, making massive document analysis and long-video understanding practical and cost-effective. It turns a major performance bottleneck into a streamlined, hardware-efficient process.

How much compute are we wasting on "long-tail" tokens that don't actually matter? FlashPrefill suggests the answer is: a lot.

#AI #LLMs #MachineLearning #DeepLearning #TechInnovation #GPUComputing

Source: https://arxiv.org/pdf/2603.06199
alvarobartt 
posted an update 19 days ago
view post
Post
3374
Learn how to deploy Microsoft Research VibeVoice ASR on Microsoft Azure Foundry with Hugging Face to generate rich audio transcriptions with Who, When, and What! 💥

> 🕒 60-minute single-pass processing, no chunking or stitching
> 👤 Customized hotwords to guide recognition on domain-specific content
> 📝 Rich transcription: joint ASR + diarization + timestamping in one pass
> 🌍 50+ languages with automatic detection and code-switching support
> 🤗 Deployed on Microsoft Foundry via an OpenAI-compatible Chat Completions API

https://huggingface.co/docs/microsoft-azure/foundry/examples/deploy-vibevoice-asr
alvarobartt 
posted an update about 2 months ago
view post
Post
3169
💥 hf-mem v0.4.1 now also estimates KV cache memory requirements for any context length and batch size with the --experimental flag!

uvx hf-mem --model-id ... --experimental will automatically pull the required information from the Hugging Face Hub to include the KV cache estimation, when applicable.

💡 Alternatively, you can also set the --max-model-len, --batch-size and --kv-cache-dtype arguments (à la vLLM) manually if preferred.
  • 1 reply
·
mlabonne 
posted an update 3 months ago
DavidVivancos 
posted an update 4 months ago
juhoinkinen 
posted an update 4 months ago
view post
Post
255
**AI4LAM’s annual conference, AI Everywhere, All at Once
December 3 – 5, 2025, British Library, London**

See the conference programme: 👉 https://www.conftool.org/fantastic-futures-2025/sessions.php

Some program items related to NatLibFi/Annif:
• Workshop:
• Evaluating Automated Subject Indexing Methods, Maximilian Kähler
• Presentations:
• Autocat Cataloguing Assistant
• The usage of hardware resources for automatic subject cataloguing at the German National Library – an analysis and outlook for future challenges, Christoph Poley
• Posters:
• AI-Powered Subject Indexing in the Archives – Piloting Finto AI at the Finnish Literature Society, Milla Eräsaari and Teemu Hirvonen
• From Annotation to Insight: Human-in-the-Loop Machine Learning for Historical Archives in HAICu WP2, C.A. Romein and others
DavidVivancos 
posted an update 4 months ago