spuun is trying's picture

2 11 4

spuun is trying

spuun

·

https://discord.thisisartunion.com

AI & ML interests

NLG and AI applications for artistic purposes. Specifically for the Art Union Discord server.

Recent Activity

published a Space 7 days ago

spuun/mwsamanaga

reacted to grimjim's post with 🚀 14 days ago

This recent paper points to an explanation for the unreasonable effectiveness of Frankenmerges: https://huggingface.co/papers/2502.05171 Specifically, the duplication of layers in Frankenmerges serves a purpose similar to what occurs in their recurrent-depth architecture. Successful frankenmerges that operate without additional fine-tuning are able to recover or "heal" from any damage due to abrupt transitions between layer blocks. Operational replicated layer blocks can provide functional benefits grounded in latent reasoning. Frankenmerges can also result in hybrid reasoning, by splicing together the latent reasoning of different models. Back in April 2024, I was able to duplicate a few layers in the Llama 3 8B model, turning it into a 9B model, without harming benchmarks significantly, despite any transition damage. https://huggingface.co/grimjim/llama-3-experiment-v1-9B My informal experimentation suggested that latent reasoning circuits could occupy continguous stacks of 2-4 layers, though the result was highly sensitive to the choice of transition location between layers.

upvoted a paper 14 days ago

Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling

View all activity

Organizations

None yet

spuun's activity

liked a model 17 days ago

jxm/sentence-transformers_all-MiniLM-L6-v2msmarco128

Updated Oct 31, 2023 • 50 • 2

liked a model 20 days ago

mradermacher/L3.2-JametMini-3B-MK.III-GGUF

Updated Oct 13, 2024 • 94 • 3

liked a Space 5 months ago

Llama3.1 Instruct O1

Generate detailed step-by-step answers to questions

liked a model almost 2 years ago

mrm8488/Alpacoom

Text Generation • Updated Mar 24, 2023 • 75