Nuclear Norm Regularization for Deep Learning
Paper
•
2405.14544
•
Published
•
1
Understanding about representation sheds light on optimization
Note CS inequality for matrix allows penalizing element-wise Frobenius norm to encourage low-rank representations.
Note Some token have more synonyms than others.
Note Customize attention mask with optimized performance comparable with Flashattention
Note Halve KV cache via sharing value embedding across attention blocks