-
Nuclear Norm Regularization for Deep Learning
Paper • 2405.14544 • Published • 1 -
Token embeddings violate the manifold hypothesis
Paper • 2504.01002 • Published • 1 -
Approximate Nullspace Augmented Finetuning for Robust Vision Transformers
Paper • 2403.10476 • Published • 1 -
ElaLoRA: Elastic & Learnable Low-Rank Adaptation for Efficient Model Fine-Tuning
Paper • 2504.00254 • Published • 1
Collections
Discover the best community collections!
Collections including paper arxiv:2505.11820
-
Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs
Paper • 2501.18585 • Published • 61 -
RWKV-7 "Goose" with Expressive Dynamic State Evolution
Paper • 2503.14456 • Published • 148 -
DeepMesh: Auto-Regressive Artist-mesh Creation with Reinforcement Learning
Paper • 2503.15265 • Published • 47 -
Cosmos-Reason1: From Physical Common Sense To Embodied Reasoning
Paper • 2503.15558 • Published • 47
-
LLM Pruning and Distillation in Practice: The Minitron Approach
Paper • 2408.11796 • Published • 59 -
TableBench: A Comprehensive and Complex Benchmark for Table Question Answering
Paper • 2408.09174 • Published • 53 -
To Code, or Not To Code? Exploring Impact of Code in Pre-training
Paper • 2408.10914 • Published • 43 -
Open-FinLLMs: Open Multimodal Large Language Models for Financial Applications
Paper • 2408.11878 • Published • 61
-
FLAME: Factuality-Aware Alignment for Large Language Models
Paper • 2405.01525 • Published • 29 -
DeepSeek-Prover: Advancing Theorem Proving in LLMs through Large-Scale Synthetic Data
Paper • 2405.14333 • Published • 41 -
Transformers Can Do Arithmetic with the Right Embeddings
Paper • 2405.17399 • Published • 54 -
EasyAnimate: A High-Performance Long Video Generation Method based on Transformer Architecture
Paper • 2405.18991 • Published • 12
-
Beyond A*: Better Planning with Transformers via Search Dynamics Bootstrapping
Paper • 2402.14083 • Published • 49 -
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
Paper • 2402.17764 • Published • 618 -
Genie: Generative Interactive Environments
Paper • 2402.15391 • Published • 73 -
Humanoid Locomotion as Next Token Prediction
Paper • 2402.19469 • Published • 29