Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
bartowski 
posted an update Aug 17
Post
10000
So turns out I've been spreading a bit of misinformation when it comes to imatrix in llama.cpp

It starts true; imatrix runs the model against a corpus of text and tracks the activation of weights to determine which are most important

However what the quantization then does with that information is where I was wrong.

I think I made the accidental connection between imatrix and exllamav2's measuring, where ExLlamaV2 decides how many bits to assign to which weight depending on the goal BPW

Instead, what llama.cpp with imatrix does is it attempts to select a scale for a quantization block that most accurately returns the important weights to their original values, ie minimizing the dequantization error based on the importance of activations

The mildly surprising part is that it actually just does a relatively brute force search, it picks a bunch of scales and tries each and sees which one results in the minimum error for weights deemed important in the group

But yeah, turns out, the quantization scheme is always the same, it's just that the scaling has a bit more logic to it when you use imatrix

Huge shoutout to @compilade for helping me wrap my head around it - feel free to add/correct as well if I've messed something up

Thanks @bartowski for breaking this down! :)

hi im new hello to every one

Such an appreciation for people who will make the effort to point out incidentally spreading some misinformation & then to provide the correction / update ;-)
(outside of git issues & such heh)

I saw the imatrix dataset which is a whole text file, I'm trying to recreate your wizardry in ONNX lol and I wonder how you make sense of the whole text, how do you chunk it? etc, etc? Help appreciated, and I'm glad you started posting, just found out about this new feature last week, take care. You doing god's work, and your quants are the best. GGUF quants have come such a long way, I see smaller files, and faster outputs, but even ONNX is beating GGUF in my tests, it just take more refined approach.

After examining it, the most I could take away was questions + answers + random text.

I coded Python script:

with open("calibration_datav3.txt", "rt") as file:
    data = file.read()

data_blocks = data.split("Q:\n\n")[1:]

for i, block in enumerate(data_blocks, 1):
    block = block.split("A:\n\n")
    question = block[0].strip()
    answer = block[1].strip().split("\n\n")[0].strip()
    print(f"### QUESTION:\n{question}\n")
    print(f"### ANSWER:\n{answer}")
    if i != len(data_blocks):
        print("\n---\n")

and it give me some structured data, although some parts of the answers are truncated 😅, example:

### QUESTION:
как передать json на сервер

Здравствуйте, у меня есть 2 json объекта, находящиеся в javascript. Каким образом мне хранить их на сервере, файлами или в запросе передавать? Пожалуйста, с примерами кода.
Бэкэнд на ASP.NET 4.5

### ANSWER:
На клиенте конвертировать его в string:
myStringObj = JSON.stringify(myObj);

---

...

---

### QUESTION:
Show that $S_5$ does not have a quotient group isomorphic to $S_4$

Show that $S_5$ does not have a quotient group isomorphic to $S_4$.

If we to assume that $H$ is such a group, than $H$ must be normal in $S_5$ and $|H|=|S_5|/|S_4|=5$. So $H$ must be isomorphic to $\mathbb{Z}/5\Bbb Z$.
That's as far as my logic goes. I couldn't arrive at a contradiction.
Any ideas?

### ANSWER:
The possible candidates for such an $H$ are the subgroups of $S_5$ that are cyclic of order 5.  All elements of $S_5$ of order 5 are given by $5$-cycles.  However, the subgroup generated by a 5-cycle is not normal, so no $H$ can exist, as desired.

@ddh0 Read this. 😋