Jaward Sesay

Jaward

AI & ML interests

Building Lectūra Labs | CS Grad Student @BIT | AI/ML Research: Autonomous Agents, LLMs | First Paper (AutoAgents: A Framework for Automatic Agent Generation) Accepted @ IJCAI 2024 | Role Model Karpathy

Recent Activity

liked a model 4 days ago

nvidia/canary-1b-flash

liked a model 4 days ago

patmodels/bpm

posted an update 7 days ago

It’s absolutely mind blowing - the work Dynamics Lab is doing!! With just a single input image and in a few seconds, their new world engine model (Mirage 2) can generate a whole new interactive world that’s physics informed and fully explorable in real-time🤯 Try it yourself: https://demo.dynamicslab.ai/chaos

View all activity

Organizations

posted an update 7 days ago

Post

6885

It’s absolutely mind blowing - the work Dynamics Lab is doing!!
With just a single input image and in a few seconds, their new world engine model (Mirage 2) can generate a whole new interactive world that’s physics informed and fully explorable in real-time🤯
Try it yourself: https://demo.dynamicslab.ai/chaos

1 reply

replied to their post 17 days ago

you're welcome, nice work.

posted an update 19 days ago

Post

4178

fascinating read!
staying bullish on search with rl might just help us get rid of hallucination entirely. I really like their approach:
1) <think>on prompt/context && what u know </think>
2) self <search>when u don’t know</search> (iteratively) with no external tool
3) <information>cite sources to support claim(s)</information>
4) <answer>final answer</answer>
their rl training was done cost efficiently too, see code: https://github.com/TsinghuaC3I/SSRL

2 replies

posted an update about 2 months ago

Post

3256

Towards batch sizes too small to meter🎉 beautiful work! And my personal favorite so far - I adore peak performance at small/nano scale. Everyone deserves to run/train AGI locally:) our data, our god model!
They showed that:
- you can train LLMs (upto 1B params) with as low as batch_size=1. This is unconventional given small batch sizes can lead to unstable/spiky training runs.
- you can have a stable train run with just vanilla SGD(stochastic gradient descent), no momentum required🤯
- small batch sizes are more robust to hyperparameters (i.e no worries with initialization)
- smaller batch sizes outperforms (“better per-Flops performance”) larger batch sizes.

“We recommend that practitioners training large models in memory-constrained settings exploit the benefits of small batch sizes rather than trying to emulate the large batch size setting (e.g., through gradient accumulation) typically used in industry.”

I’ve been doing this for ages - my mantra: all my experiments must scale on my 8gb ram m2 before moving to gpu. IOW I love being gpu poor. Checkout my nanoAI algo repo: https://github.com/Jaykef/ai-algorithms, all notebooks run on memory as low as 8gb ram

posted an update 2 months ago

Post

2056

I played around with the new RXTX paper (XX^T) and was able to train nanogpt with 4x4 RXTX matmuls in both attention layer and optimizer🤕
It just works (well I had to add some guardrails) but still saves 5% of memory usage:
The Patch:
- Computes attention scores with a 4x4 blockwise RXTX matmuls (no pytorch dot prod)
- Handles arbitrary sequence lengths by padding to the nearest multiple of 4.
- An RXTX variant of shampoo with params reshaped into 4x4 blocks during each optimizer step.
- Uses 5% less ops
Code: https://github.com/Jaykef/ai-algorithms/blob/main/nanogpt-rxtx.ipynb
Paper: https://arxiv.org/pdf/2505.09814

posted an update 2 months ago

Post

2333

Mind2Web 2 is out - this time featuring eval and benchmark for deep research🔥
Paper: Mind2Web 2: Evaluating Agentic Search with Agent-as-a-Judge (2506.21506)
Project: https://osu-nlp-group.github.io/Mind2Web-2/

posted an update 2 months ago

Post

3471

Awesome intro to LLM course "Language Modeling from Scratch" by stanford. love the aesthetics behind the lecture notes, notes-in-code genius idea👍
Course site: https://stanford-cs336.github.io/spring2025/
Repo: https://github.com/stanford-cs336/spring2025-lectures
Videos: https://www.youtube.com/playlist?list=PLoROMvodv4rOY23Y0BoGoBGgQ1zmU_MT_

2 replies

posted an update 3 months ago

Post

1455

not sure of what to make of this but solving autonomous/selective reflection seems like a big deal in current agent frameworks. We did hit on this with iterative self-refinement in our AutoAgents framework (https://ijcai.org/proceedings/2024/0003.pdf). Nice read, looking forward to the code.
Paper: Scaling Test-time Compute for LLM Agents (2506.12928)

replied to their post 3 months ago

will cook a deep dive tutorial on dfms sometime next week, the math is nolonger scary after taking this course:)
https://diffusion.csail.mit.edu/

posted an update 3 months ago

Post

1408

You can now edit operations with a discrete flow model, supercool👍! It's amazing to see the progress on DFM within one year since its introduction - literally my litmus test for how fast the field is progressing:
1st Introduced (2024): https://arxiv.org/abs/2402.04997
Discrete Flow Matching (2024): https://arxiv.org/abs/2407.15595
Edit Discrete Flow (2025): https://arxiv.org/pdf/2506.09018
Looking forward to a SaaS level reach like that of dLLMs e.g Mercury by inception labs 🚀

1 reply

posted an update 3 months ago

Post

1187

bumped into one of the OG reads today!! handwriting generation & synthesis is still my favorite application of RNNs - supper amazed at how such a small model (3.6M params), trained overnight on cpu could reach such peak performance. Huge credit to the data (IAM-OnDB🔥) which was meticulously curated using an infra-red device to track pen position.
Try demo here: https://www.calligrapher.ai/
Code: https://github.com/sjvasquez/handwriting-synthesis

posted an update 4 months ago

Post

1907

I gave rectified flow a try, so here is nanoRF - a lightweight implementation of a Rectified Flow Transformer model, ~ 618k parameters, 6 layers deep, dim 64, patch size 4, learning rate 5e-4 trained on my 8bg ram m2 macbookair for 2k epochs.
Code: https://github.com/Jaykef/ai-algorithms/blob/main/nanoRF.ipynb
See demo: https://x.com/Jaykef_/status/1923718725578129838
Reference Paper: https://arxiv.org/abs/2403.03206.

posted an update 4 months ago

Post

1792

Huge Win Today 🎉🎉
Our team “Afri-Aya” just won this year’s CohereAI Aya Expedition Challenge. Our work focused on 1) curating and evaluating culturally relevant African vision dataset then 2) Fine-tuning the Aya vision model to support underrepresented languages in Africa. I represented my beloved Sierra Leone with the Krio language. Krio is a beautiful first language spoken by a majority of our population. It was a humbling and inspiring experience to have it recognized, thanks to the relentless effort of everyone on the team. Special thanks to BK for offering me this opportunity 🫡 and to Cohere AI for such an amazing global research expedition🙏

posted an update 4 months ago

Post

438

Officially kicking off my startup today🎉
Join me in building the future of learning: Lectūra - an advanced multi-agent software for adaptive personalized learning experience. Research will focus on building tools that empower individual learners to master needed self-taught skills with the help of AI.
Read more: https://lecturalabs.com/
Feel free to reach out via the mentioned email and follow the official account for updates: https://x.com/lectura_ai

Curiosity has a voice, let it teach you. Generate Lectures. Customize Instructors. Get Real-time Personalized Learning.

posted an update 4 months ago

Post

3314

finally, a course that makes diffusion math much easier to grasp, well done 👍 https://diffusion.csail.mit.edu/

1 reply

replied to their post 4 months ago

if you like this work, kindly upvote the paper, thanks: https://huggingface.co/papers/2505.02707

posted an update 4 months ago

Post

693

Thrilled to share our latest work: Voila - a family of fully opensourced voice models for real-time autonomous convos and role-play, some of our major contributions include 🧵:
1) An End-to-End Full-Duplex Arch: that directly processes & handles simultaneous audio token streams from user to model and vice versa.
2) Voila-Tokenizer: A 100K-hour trained tokenizer with interleaved alignment (audio & text) that distills semantic/acoustic tokens via RVQ.
3) Text-Audio Interleaved Alignment: We leveraged a fine-grained alignment of text and audio tokens that allows synchronization and expressiveness for tasks like ASR (WER 2.7%) and TTS (WER 2.8%).
4) Voice Customization: Supports 1M+ pre-built voices and 1 shot voice clone from 10s audio clips using Wespeaker embeddings.

Models: maitrix-org/voila-67e0d96962c19f221fc73fa5
Code: https://github.com/maitrix-org/Voila
Demo: maitrix-org/Voila-demo
Project page: maitrix-org/Voila-demo

2 replies

posted an update 4 months ago

Post

1297

late submission but managed to cook up a nascent Feynman-inspired agent app for Microsoft’s AI Agent hackathon, wish me luck lol. @clem ps I need this on gpu, thank you:)
Try Demo: Jaward/Professor-AI-Feynman
Code: https://github.com/Jaykef/professor-ai-feynman

3 replies

posted an update 4 months ago

Post

3133

Finally my first solo preprint is here:) a love letter to the field. Nothing much lol, this is just me trying to finetune my understanding of research behind the recent breakthroughs in reasoning models. It’s a preprint targeting beginners in the field - will eventually make necessary changes later. In the meantime have fun with it:)
Download: https://github.com/Jaykef/Jaykef/blob/main/papers/The-Dawn-of-Thinking-Machines.pdf

posted an update 5 months ago

Post

2263

New reasoning algo just dropped: Adaptive Parallel Reasoning
“we propose Adaptive Parallel Reasoning (APR), a novel reasoning framework that enables language models to orchestrate both serialized and parallel computations end-to-end. APR generalizes existing reasoning methods by enabling adaptive multi-threaded inference using spawn() and join() operations.”
Paper: https://arxiv.org/pdf/2504.15466
Code: https://github.com/Parallel-Reasoning/APR

Jaward Sesay

AI & ML interests

Recent Activity

Organizations

Jaward's activity