Hugging Face
Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up
2
11
4
spuun is trying
spuun
Follow
21world's profile picture
Hastagaras's profile picture
Nymbo's profile picture
4 followers
ยท
4 following
https://discord.thisisartunion.com
spuunistrying
spuuntries
AI & ML interests
NLG and AI applications for artistic purposes. Specifically for the Art Union Discord server.
Recent Activity
published
a Space
7 days ago
spuun/mwsamanaga
reacted
to
grimjim
's
post
with ๐
14 days ago
This recent paper points to an explanation for the unreasonable effectiveness of Frankenmerges: https://huggingface.co/papers/2502.05171 Specifically, the duplication of layers in Frankenmerges serves a purpose similar to what occurs in their recurrent-depth architecture. Successful frankenmerges that operate without additional fine-tuning are able to recover or "heal" from any damage due to abrupt transitions between layer blocks. Operational replicated layer blocks can provide functional benefits grounded in latent reasoning. Frankenmerges can also result in hybrid reasoning, by splicing together the latent reasoning of different models. Back in April 2024, I was able to duplicate a few layers in the Llama 3 8B model, turning it into a 9B model, without harming benchmarks significantly, despite any transition damage. https://huggingface.co/grimjim/llama-3-experiment-v1-9B My informal experimentation suggested that latent reasoning circuits could occupy continguous stacks of 2-4 layers, though the result was highly sensitive to the choice of transition location between layers.
upvoted
a
paper
14 days ago
Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling
View all activity
Organizations
None yet
spuun
's activity
All
Models
Datasets
Spaces
Papers
Collections
Community
Posts
Upvotes
Likes
Articles
liked
a model
17 days ago
jxm/sentence-transformers_all-MiniLM-L6-v2__msmarco__128
Updated
Oct 31, 2023
โข
50
โข
2
liked
a model
20 days ago
mradermacher/L3.2-JametMini-3B-MK.III-GGUF
Updated
Oct 13, 2024
โข
94
โข
3
liked
a Space
5 months ago
Running
143
143
Llama3.1 Instruct O1
๐
Generate detailed step-by-step answers to questions
liked
a model
almost 2 years ago
mrm8488/Alpacoom
Text Generation
โข
Updated
Mar 24, 2023
โข
75