Long context

#2
by danielus - opened

LLama 3 models have a fairly limited context window (8k tokens) and often not enough to enter all the code needed for the model to understand how the code works. How have you addressed/do you plan to address this problem?

@danielus I really appreciate you for pointing out that the limiting factor is Llama 3's limited context window (8k tokens) for these foundation models.

Looking at my Mermaid-Llama-3 - 8B,7B,6B,5B,4B,3B
My ongoing project involves me working with some really influential people in the LLM world to bring Mermaid to the smallest size with the highest performance possible, this is in preparation for whats coming next.

Daniel this was supposed to be a secret but I have plans to work on longer context models and larger smarter models soon that will address this problem and improve drastically in terms of diagram quality.
There are just baby steps that I need to take due to my limited financial availability as I am unemployed and actively seeking employment at this time.

My goal is to bring the highest quality model to the most efficient state possible in terms of speed, output quality, and Vram size, this is to ensure that my models fit on common consumer grade hardware such as 8GB, 12GB, 16GB, 24GB Vram Graphics cards.

To answer your question on how you can currently address this problem or what most people who have contacted me that are using my models are doing:

Option 1: Loop(Extraction -> Aggregation)
This method is lossy by itself but gets really good high level system level architecture and flow diagrams for how the system and the files interact together if done right.
This means you use a prompt or sequence of prompts for extracting key details from each file or module based on your design, aggregating them into a system level abstraction then send that to my model for the diagrams until I can make a bigger model with larger context.

Let me give you a more concrete example: I worked with someone who was making a "UI Experience Flow Diagram" where the extraction prompt was to only aggregate the parts of the code that affect what the user sees, I cannot give you the prompt but I can tell you it was as simple as filling in a structured output template done with GPT4 API. This aggregated "Summary of the system at the USER Experience level" is then sent to my model to create a diagram using that context.

Option 2: RAG each file as its own flow diagram, this requires the system design principle called single responsibility principle such that each file does one and only one thing leading to smaller files and clear abstraction at the module level. This has been the most effective as RAG and Knowledge Graphs have proven to be incredibly powerful with LLM's for effective retrieval. Many people do this method and get good results where they can query any part of the program, I haven't heard of anyone using this method for full systems, but please keep me posted, I love hearing how people are prompt engineering and building cool systems with these Mermaid Models and their successes.

Best of luck Daniel.

@TroyDoesAI Wow, thanks for the complemented and detailed answer, at the moment my question was pure curiosity, I have no real intention to get into complex algorithms to create diagrams from large portions of text. Also because from my experience with LLMs it seems that as the input size increases, the output size does not increase proportionally. To give you an example,
consider this prompt:

*1000-characters long document*
Summarize the following text in bullet points

The language model is used to create 4-5 key points from the text.

But now if I paste a 10,000-character text, the language model does not generate 40 key points, but always about 5. This implies an inherent loss of detail.

This argument can be extended to your models, while you can insert 128k tokens of code, the AI will not necessarily generate a map as accurate as it would create it by pasting 1k tokens of code. So in my opinion the solution is not to increase the capacity of the model, but rather to solve the problem with an algorithm.

It would be different if the fine tune of the model is done with a very particular dataset, I imagine something like

Context:

  • 100k code tokens of the entire codebase.

Algorithm:

  • The function whose algorithm I want to create.

By doing so, the Language model would use the context to retrieve some missing details such as library and API calls. A similar system was adopted by the Airoboros language model (https://huggingface.co/jondurbin/airoboros-34b-3.3)

In general, it fascinates me to see how a language model is able to create concept maps, so I am very interested in this field, and I hope you find investors and funds to continue to contribute as you are already doing, perhaps by training larger models (70b, or several MoEs) that are certainly more capable and able to capture more features of the code

F for the discussion
I would have liked to know your opinion on this

danielus changed discussion status to closed

Sign up or log in to comment