Jack Voide

Mindweller
·

AI & ML interests

None yet

Recent Activity

View all activity

Organizations

None yet

Mindweller's activity

reacted to eaddario's post with 👍 4 days ago
view post
Post
2692
Squeezing out tensor bits?

I have been tinkering with quantization and pruning to reduce model sizes. So far, I've had modest success in producing, on average, 8% smaller versions with negligible loss of quality, and I think further reductions in the 10-15% range are realistic, but I've come across a behaviour I wasn't expecting!

Part of the process I'm following consists of quantizing the embedding and output layers aggressively. Since the embedding layer is more about lookup than complex computation, the vectors representing the relative distances between embeddings are usually preserved well enough making this layer fairly robust to quantization. So far, so good.

The output layer, on the other hand, maps the final hidden state to the vocabulary logits and therefore, small changes in these logits could lead to a different probability distribution over the vocabulary, resulting in incorrect word predictions, or so I thought.

Surprisingly, I'm finding that even at Q2_K the loss of overall capability is minimal. Was this to be expected? or am I missing something?

I have published a version with all the test results if you want to give it a try: eaddario/DeepSeek-R1-Distill-Qwen-7B-GGUF

I'll upload other models as time allows.

Any ideas / clarifications / suggestions are very much welcomed!
  • 3 replies
·
reacted to mkurman's post with 👍❤️ 4 days ago
view post
Post
3608
Introducing a new architecture, MedIT One – a single-token transformer with LSTM-like recurrence.

It is extremely fast in training and inference, but we lack funding for large-scale training. Enjoy 🍓

https://github.com/MedITSolutionsKurman/medit-one

reacted to retronic's post with 🤝 22 days ago
view post
Post
1755
The Colox idea is getting replaced with a clone of OpenAI Deep Research due to retraining issues and reasoning issues

So now I am working on a Deep Research system with Ollama that will function like OpenAI's version for FREE! This will be a local alternative, no potato PC can handle this keep in mind.