Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
YedsonUQ 's Collections
Efficient Inference
Test-Time Scaling (TTS)
Agents AI
Long-context
Foundational Deep Learning - Architecture
AI-Automated Scientific Research
Benchmark and Evaluation
Distributed Training and Federated Learning
Explainable AI - Interpretable AI
Findings
Theory, Conceptualization, Paradigms
Hallucination
Learning Paradigm/Scheme
Models Series
Reasoning - Chain-of-Thought
Reinforcement Learning (RL)
Retrieval Augmented Generation (RAG)
Uncertainty Quantification
Survey

Foundational Deep Learning - Architecture

updated 30 days ago
Upvote
-

  • Forgetting Transformer: Softmax Attention with a Forget Gate

    Paper • 2503.02130 • Published Mar 3 • 32

  • L^2M: Mutual Information Scaling Law for Long-Context Language Modeling

    Paper • 2503.04725 • Published Mar 6 • 21

  • Transformers without Normalization

    Paper • 2503.10622 • Published Mar 13 • 164

  • I-Con: A Unifying Framework for Representation Learning

    Paper • 2504.16929 • Published Apr 23 • 30
Upvote
-
  • Collection guide
  • Browse collections
Company
TOS Privacy About Jobs
Website
Models Datasets Spaces Pricing Docs