Sarath Shekkizhar
shekkizh
AI & ML interests
None yet
Recent Activity
posted
an
update
2 days ago
🙋🏽♂️ Is your "multi agent" system really multi agentic? Or is it just a modular setup with a bunch of different prompts? 🤨
I’ve had this discussion way too often, so I finally wrote it all down. If you’re building with agents, you need to read this.
Here’s the TLDR:
✅ True multi agent systems require:
• Persistent, private state per agent
• Memory that impacts future decisions
• Adaptation based on past experiences
❌ Just having modular components, function calls, or multiple LLMs doesn't cut it. That’s not multi agentic. It’s just pipelining.
🤝 The magic is in evolving relationships, context retention, and behavioral shifts over time.
🧠 If your agents aren’t learning from each other or changing based on past experience… you are missing the point.
What do you think? Curious what patterns you're experimenting with 🧐
👉 Full post: https://shekkizh.github.io/posts/2025/04/multi-agents/
posted
an
update
3 days ago
Think AGI is just around the corner? Not so fast.
When OpenAI released its Computer-Using Agent (CUA) API, I happened to be playing Wordle 🧩 and thought, why not see how the model handles it?
Spoiler: Wordle turned out to be a surprisingly effective benchmark.
So Romain Cosentino Ph.D. and I dug in and analyzed the results of several hundred runs.
🔑 Takeaways
1️⃣ Even the best computer-using models struggle with simple, context-dependent tasks.
2️⃣ Visual perception and reasoning remain major hurdles for multimodal agents.
3️⃣ Real-world use cases reveal significant gaps between hype and reality. Perception accuracy drops to near zero by the last turn 📉
🔗 Read our arxiv article for more details https://www.arxiv.org/abs/2504.15434
posted
an
update
17 days ago
Some interesting architectural choices made in Llama 4 models -- were these key to the 10M context? Possibly 🤔
🔍 Takeaways:
🧩 Interleaved Attention without position encoding
- LLaMA 4 removes explicit positional encoding in some attention layers to boost performance on longer contexts.
- The principles here could be similar to the residual connections to facilitate attention to early tokens without positional decay.
⚖️ Scaled Softmax to increase attention at inference time
- The max attention value (output of softmax) decreases as context size increases.
- Llama 4 incorporates a context-size dependent temperature in the softmax function to modify the slope of softmax, allowing the model to focus better on relevant tokens.
- Done only at inference time -- guessing it was more a choice after some observation on eval datasets.
What did you think of these choices?
Organizations
shekkizh's activity
Dataset loading failing with HF load_dataset
1
#3 opened 11 months ago
by
shekkizh

great evals
1
#2 opened 12 months ago
by
gblazex
Script to reproduce MT-Bench
2
#1 opened 12 months ago
by
MaziyarPanahi

Evaluation for 70B model FAILED (tenyx/Llama3-TenyxChat-70B)
5
#719 opened 12 months ago
by
shekkizh

Update README.md
1
#1 opened over 1 year ago
by
mostafagv
