A book feels like I'd have to spend a lot of time formalizing things to convey to the reader, and not experienced enough to put things down in a structured manner to be interesting enough. I want to be mostly vague, go into things deeply but not too much, touch on a lot of topics that are quite different and not necessarily related to each other, and just have it out there where people can interact and point out things when I'm wrong.
Aryan V S
a-r-r-o-w
AI & ML interests
computer vision, reinforcement learning
Recent Activity
upvoted
a
paper
1 day ago
Text-Aware Image Restoration with Diffusion Models
upvoted
an
article
8 days ago
Groq on Hugging Face Inference Providers π₯
replied to
their
post
11 days ago
Recently, I've been focusing my learning on the following topics:
- Pytorch internals, specifically the inductor system (roughly ~1 month of experience)
- Triton internals (~8 moe)
- CUDA (~3 moe)
- Understanding fusion patterns in compilers and how to improve them (~1 moe)
- Parallelism strategies for large scale inference optimization (~6-7 moe)
I thought it would be nice to document it somewhere for no particular reason. Maybe someone will find it useful? It's also because I want to get into the habit of writing, but had no motivation to do so. Maybe writing short informal posts will help build the habit.
Since I don't have a personal site, and don't plan to create one in the near future, I think HF posts are best suited for short and informal documentation to share my little discoveries and learnings. If you're interested, strap in!
First post in this series will be on basic study of Pytorch's float32 matmuls and their Triton implementation (nothing much, just the tutorial available on the website), short dive into TF32 and their TFLOPS comparison on an A100 machine.