
Manuel Faysse
AI & ML interests
Recent Activity
Organizations
manu's activity
Error when loading the model

KV Cache from scratch in nanoVLM

And I was telling Jack, I loved the CDE concept and it was a big part of the inspiration here! Initially, we wanted to be able to in-context learn the domain on disjoint small documents concatenated and late chunked (as an alternative to what we do on long docs here when we had no long docs), but sadly this worsened performance on long docs and we didn't pursue it much further ! This could have enabled doing CDE in one-stage only, I still think there are things to do in this space !

Hey Tom ! Thanks a lot !
Late Chunking actually doesn't change a thing compared to classical bi-encoders in terms of storage/inference cost ! In both cases, the token embeddings are averaged over each chunk. In classic LC, you could decide to keep all token embeddings to truly be able to chunk dynamically - but most often you already know how you are going to chunk, so you can use the same chunks standard bi-encoders would use, and get better contextualization (leading to better robustness to bad chunking - see ablations in section 6).
In our case, we have [sep] tokens between the chunks, and find chunk representations largely improve when they learn to stay different from adjacent document chunks. This could not be done without the model knowing the chunk boundaries beforehand. After averaging tokens between the [sep] tokens, we thus get exactly the same embedding sizes as a standard bi-encoders.
This is nice cause it really is a drop in replacement. Having said that, we believe models we trained still need to be a bit improved for production use cases, we were mainly happy to show the research direction looked very promising!
Cheers @tomaarsen and thanks for the kind words as always @merve !
