Massimo Roberto Scamarcia

mrs83

AI & ML interests

Natural Language Processing, Text Generation, Question Answering, Data Augmentation, Knowledge Transfer, Chain-of-Thought, ResearchOps, MLOps

Recent Activity

new activity 1 day ago

ethicalabs/Kurtis-EON1:Echo-DSRN - Triton Kernel Benchmark Report - PyTorch (native) vs Triton Legacy (sequential) vs Triton 3-Pass (new)

new activity 1 day ago

ethicalabs/Kurtis-EON1:SFT/Alignment - Phase 007-06-MLP8: ethicalabs/Kurtis-EON1-SFT Mix (1 epoch)

updated a model 2 days ago

ethicalabs/Echo-DSRN-486M-v0.7.6-SFT

View all activity

Organizations

New activity in ethicalabs/Kurtis-EON1 1 day ago

Echo-DSRN - Triton Kernel Benchmark Report - PyTorch (native) vs Triton Legacy (sequential) vs Triton 3-Pass (new)

#10 opened 2 days ago by

mrs83

SFT/Alignment - Phase 007-06-MLP8: ethicalabs/Kurtis-EON1-SFT Mix (1 epoch)

#9 opened 2 days ago by

mrs83

updated a model 2 days ago

ethicalabs/Echo-DSRN-486M-v0.7.6-SFT

Text Generation • 0.5B • Updated 1 day ago • 217

updated a collection 2 days ago

Kurtis-EON1

Collection

Language Model • 12 items • Updated 2 days ago

published a model 2 days ago

ethicalabs/Echo-DSRN-486M-v0.7.6-SFT

Text Generation • 0.5B • Updated 1 day ago • 217

updated a model 2 days ago

ethicalabs/Kurtis-EON1

Text Generation • Updated 1 day ago • 5

updated a collection 2 days ago

Kurtis-EON1

Collection

Language Model • 12 items • Updated 2 days ago

updated a dataset 2 days ago

ethicalabs/Kurtis-EON1-SFT

Viewer • Updated 2 days ago • 200k • 13

updated a collection 2 days ago

Kurtis-EON1

Collection

Language Model • 12 items • Updated 2 days ago

reactedto SeaWolf-AI's post with 🔥 6 days ago

Post

8141

🚀 Introducing MARL — Runtime Middleware That Reduces LLM Hallucination Without Fine-Tuning

Now available on PyPI · GitHub · ClawHub · HuggingFace
AI models sense they could be wrong, but they can't actually fix what's broken.

🤗 Live A/B test: VIDraft/MARL

We evaluated 9 SOTA models (GPT-5.2, Claude Opus 4.6, Gemini 3 Pro, etc.) across 1,800 assessments in FINAL Bench and found a 39.2%p gap between "recognizing potential errors (MA=0.694)" and "actually finding and fixing them (ER=0.302)."

MARL (Model-Agnostic Runtime Middleware for LLMs) was built to close this metacognitive gap. It decomposes a single LLM call into a 5-stage expert pipeline (Hypothesis → Solver → Auditor → Adversarial Verifier → Synthesizer), transforming "answer in one shot" into "think, doubt, correct, and rewrite."

No weight modification — works instantly with GPT-5.4, Claude, Gemini, Llama, or any OpenAI API-compatible LLM by changing one line: base_url. Ships with 9 domain-specific emergence engines (invention, pharma, genomics, chemistry, ecology, law, and more — 5,538 expert data items) activated by a simple tag like model="gpt-5.4::pharma".

pip install marl-middleware

MARL is also officially registered on ClawHub, the skill marketplace of OpenClaw — an AI agent platform with 260K+ developers and 3,200+ skills. It's the first middleware in the Reasoning Enhancement category. One command — clawhub install marl-middleware — gives your AI agent a metacognition upgrade.

📝 Technical deep dive: https://huggingface.co/blog/FINAL-Bench/marl-middleware
📦 PyPI: https://pypi.org/project/marl-middleware/
🐙 GitHub: https://github.com/Vidraft/MARL
🦀 ClawHub: https://clawhub.ai/Cutechicken99/marl-middleware

#MARL #LLM #Hallucination #Metacognition #MultiAgent #AIMiddleware #FINALBench #OpenClaw #ClawHub #PyPI #AGI #HuggingFace #ReasoningAI #SelfCorrection #GlassBoxAI

updated a collection 6 days ago

Kurtis-EON1

Collection

Language Model • 12 items • Updated 2 days ago

liked a Space 7 days ago

Leaderboard of Smol Worldcup

📈

Benchmark Evaluation for Small LLMs - Leaderboard

New activity in ethicalabs/Kurtis-EON1 7 days ago

SFT/Alignment - Phase 007-02-MLP8: mlabonne/FineTome-100k-dedup (1000 steps + 500 steps)

#8 opened 7 days ago by

mrs83

reactedto SeaWolf-AI's post with 🔥 7 days ago

Post

11061

🏟️ Smol AI WorldCup: A 4B Model Just Beat 8B — Here's the Data

We evaluated 18 small language models from 12 makers on 125 questions across 7 languages. The results challenge the assumption that bigger is always better.

Community Article: https://huggingface.co/blog/FINAL-Bench/smol-worldcup
Live Leaderboard: ginigen-ai/smol-worldcup
Dataset: ginigen-ai/smol-worldcup

What we found:

→ Gemma-3n-E4B (4B, 2GB RAM) outscores Qwen3-8B (8B, 5.5GB). Doubling parameters gained only 0.4 points. RAM cost: 2.75x more.

→ GPT-OSS-20B fits in 1.5GB yet matches Champions-league dense models requiring 8.5GB. MoE architecture is the edge AI game-changer.

→ Thinking models hurt structured output. DeepSeek-R1-7B scores 8.7 points below same-size Qwen3-8B and runs 2.7x slower.

→ A 1.3B model fabricates confident fake content 80% of the time when prompted with nonexistent entities. Qwen3 family hits 100% trap detection across all sizes.

→ Qwen3-1.7B (1.2GB) outscores Mistral-7B, Llama-3.1-8B, and DeepSeek-R1-14B. Latest architecture at 1.7B beats older architecture at 14B.

What makes this benchmark different?

Most benchmarks ask "how smart?" — we measure five axes simultaneously: Size, Honesty, Intelligence, Fast, Thrift (SHIFT). Our ranking metric WCS = sqrt(SHIFT x PIR_norm) rewards models that are both high-quality AND efficient. Smart but massive? Low rank. Tiny but poor? Also low.

Top 5 by WCS:
1. GPT-OSS-20B — WCS 82.6 — 1.5GB — Raspberry Pi tier
2. Gemma-3n-E4B — WCS 81.8 — 2.0GB — Smartphone tier
3. Llama-4-Scout — WCS 79.3 — 240 tok/s — Fastest model
4. Qwen3-4B — WCS 76.6 — 2.8GB — Smartphone tier
5. Qwen3-1.7B — WCS 76.1 — 1.2GB — IoT tier

Built in collaboration with the FINAL Bench research team. Interoperable with ALL Bench Leaderboard for full small-to-large model comparison.

Dataset is open under Apache 2.0 (125 questions, 7 languages). We welcome new model submissions.