Jack Voide

Mindweller

AI & ML interests

None yet

Recent Activity

liked a model about 7 hours ago

Qwen/QwQ-32B

liked a model 4 days ago

mradermacher/Viper-Coder-v1.6-r999-GGUF

reacted to eaddario's post with 👍 4 days ago

Squeezing out tensor bits? I have been tinkering with quantization and pruning to reduce model sizes. So far, I've had modest success in producing, on average, 8% smaller versions with negligible loss of quality, and I think further reductions in the 10-15% range are realistic, but I've come across a behaviour I wasn't expecting! Part of the process I'm following consists of quantizing the embedding and output layers aggressively. Since the embedding layer is more about lookup than complex computation, the vectors representing the relative distances between embeddings are usually preserved well enough making this layer fairly robust to quantization. So far, so good. The output layer, on the other hand, maps the final hidden state to the vocabulary logits and therefore, small changes in these logits could lead to a different probability distribution over the vocabulary, resulting in incorrect word predictions, or so I thought. Surprisingly, I'm finding that even at Q2_K the loss of overall capability is minimal. Was this to be expected? or am I missing something? I have published a version with all the test results if you want to give it a try: https://huggingface.co/eaddario/DeepSeek-R1-Distill-Qwen-7B-GGUF I'll upload other models as time allows. Any ideas / clarifications / suggestions are very much welcomed!

View all activity

Organizations

None yet

Mindweller's activity

liked a model about 7 hours ago

Qwen/QwQ-32B

Text Generation • Updated about 6 hours ago • 8.74k • • 1.02k

liked a model 4 days ago

mradermacher/Viper-Coder-v1.6-r999-GGUF

Updated 4 days ago • 481 • 2

reacted to eaddario's post with 👍 4 days ago

Post

2692

Squeezing out tensor bits?

I have been tinkering with quantization and pruning to reduce model sizes. So far, I've had modest success in producing, on average, 8% smaller versions with negligible loss of quality, and I think further reductions in the 10-15% range are realistic, but I've come across a behaviour I wasn't expecting!

Part of the process I'm following consists of quantizing the embedding and output layers aggressively. Since the embedding layer is more about lookup than complex computation, the vectors representing the relative distances between embeddings are usually preserved well enough making this layer fairly robust to quantization. So far, so good.

The output layer, on the other hand, maps the final hidden state to the vocabulary logits and therefore, small changes in these logits could lead to a different probability distribution over the vocabulary, resulting in incorrect word predictions, or so I thought.

Surprisingly, I'm finding that even at Q2_K the loss of overall capability is minimal. Was this to be expected? or am I missing something?

I have published a version with all the test results if you want to give it a try: eaddario/DeepSeek-R1-Distill-Qwen-7B-GGUF

I'll upload other models as time allows.

Any ideas / clarifications / suggestions are very much welcomed!

3 replies

reacted to mkurman's post with 👍❤️ 4 days ago

Post

3608

Introducing a new architecture, MedIT One – a single-token transformer with LSTM-like recurrence.

It is extremely fast in training and inference, but we lack funding for large-scale training. Enjoy 🍓

https://github.com/MedITSolutionsKurman/medit-one

liked 3 models 10 days ago

liked 3 models 14 days ago

agentica-org/DeepScaleR-1.5B-Preview

Text Generation • Updated 11 days ago • 55.1k • • 500

huihui-ai/DeepSeekR1-QwQ-SkyT1-32B-Fusion-811

Text Generation • Updated 13 days ago • 695 • 7

prithivMLmods/Viper-Coder-v1.1

Text Generation • Updated 11 days ago • 361 • 16

liked a model 15 days ago

Severian/Glyphstral-24b-v1

Updated 23 days ago • 253 • 13

liked 3 models 16 days ago

Zyphra/Zonos-v0.1-transformer

Text-to-Speech • Updated 19 days ago • 167k • 375

stepfun-ai/GOT-OCR-2.0-hf

Image-Text-to-Text • Updated Jan 31 • 177k • 170

stepfun-ai/GOT-OCR2_0

Image-Text-to-Text • Updated about 1 month ago • 77.8k • 1.41k

liked a model 19 days ago

OpenVINO/distil-whisper-large-v3-int8-ov

Updated Dec 16, 2024 • 4.77k • 2

liked a model 20 days ago

tomg-group-umd/huginn_swa_75_7_ema_0.9_merge

Text Generation • Updated 25 days ago • 267 • 1

liked a model 22 days ago

Cran-May/CohenQu-DeepSeek-R1-Distill-Qwen-1.5B-GRPO-duplicate-fixed-6140715-Q5_K_M-GGUF

Updated 22 days ago • 229 • 2

reacted to retronic's post with 🤝 22 days ago

Post

1755

The Colox idea is getting replaced with a clone of OpenAI Deep Research due to retraining issues and reasoning issues

So now I am working on a Deep Research system with Ollama that will function like OpenAI's version for FREE! This will be a local alternative, no potato PC can handle this keep in mind.

liked a model 22 days ago

mradermacher/UI-TARS-7B-DPO-GGUF

Updated Jan 21 • 5.58k • 8