Amazing work! Loved the trick to register the custom types as Pytrees.
If I understand it correctly, the KV cache will not be in use in the jitted code. Did you explore the use of a fixed-size StaticCache
instead, or will that be part of your sequence generation followup?
Hi Pedro,
Yes that is a good idea. I didn't know about StaticCache let me explore that. Thanks!