Models
Datasets
Spaces
Posts
Docs
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2005.14165

Attention Is All You Need

Paper • 1706.03762 • Published Jun 12, 2017 • 44
Playing Atari with Deep Reinforcement Learning

Paper • 1312.5602 • Published Dec 19, 2013
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Paper • 1810.04805 • Published Oct 11, 2018 • 14
Language Models are Few-Shot Learners

Paper • 2005.14165 • Published May 28, 2020 • 11

LLM Technical Report

Qwen2.5-Coder Technical Report

Paper • 2409.12186 • Published Sep 18 • 125
Qwen2.5-Math Technical Report: Toward Mathematical Expert Model via Self-Improvement

Paper • 2409.12122 • Published Sep 18 • 1
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

Paper • 2405.04434 • Published May 7 • 13
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

Paper • 2402.03300 • Published Feb 5 • 69

SciLitLLM: How to Adapt LLMs for Scientific Literature Understanding

Paper • 2408.15545 • Published Aug 28 • 34
Controllable Text Generation for Large Language Models: A Survey

Paper • 2408.12599 • Published Aug 22 • 62
To Code, or Not To Code? Exploring Impact of Code in Pre-training

Paper • 2408.10914 • Published Aug 20 • 40
Automated Design of Agentic Systems

Paper • 2408.08435 • Published Aug 15 • 38

LLM Fundamental papers

Attention Is All You Need

Paper • 1706.03762 • Published Jun 12, 2017 • 44
Language Models are Few-Shot Learners

Paper • 2005.14165 • Published May 28, 2020 • 11
GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints

Paper • 2305.13245 • Published May 22, 2023 • 5
Llama 2: Open Foundation and Fine-Tuned Chat Models

Paper • 2307.09288 • Published Jul 18, 2023 • 242

Attention Is All You Need

Paper • 1706.03762 • Published Jun 12, 2017 • 44
LLaMA: Open and Efficient Foundation Language Models

Paper • 2302.13971 • Published Feb 27, 2023 • 13
Efficient Tool Use with Chain-of-Abstraction Reasoning

Paper • 2401.17464 • Published Jan 30 • 16
MoMa: Efficient Early-Fusion Pre-training with Mixture of Modality-Aware Experts

Paper • 2407.21770 • Published Jul 31 • 22

Attention Is All You Need

Paper • 1706.03762 • Published Jun 12, 2017 • 44
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Paper • 1810.04805 • Published Oct 11, 2018 • 14
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter

Paper • 1910.01108 • Published Oct 2, 2019 • 14
Language Models are Few-Shot Learners

Paper • 2005.14165 • Published May 28, 2020 • 11

LLM foundations

Mixture-of-Depths: Dynamically allocating compute in transformer-based language models

Paper • 2404.02258 • Published Apr 2 • 104
Textbooks Are All You Need

Paper • 2306.11644 • Published Jun 20, 2023 • 142
Jamba: A Hybrid Transformer-Mamba Language Model

Paper • 2403.19887 • Published Mar 28 • 104
Large Language Models Struggle to Learn Long-Tail Knowledge

Paper • 2211.08411 • Published Nov 15, 2022 • 3

Fundamentals LLM

Long-form factuality in large language models

Paper • 2403.18802 • Published Mar 27 • 24
Attention Is All You Need

Paper • 1706.03762 • Published Jun 12, 2017 • 44
Language Models are Few-Shot Learners

Paper • 2005.14165 • Published May 28, 2020 • 11
A Survey of GPT-3 Family Large Language Models Including ChatGPT and GPT-4

Paper • 2310.12321 • Published Oct 4, 2023 • 1

Must Reads On Language Model

Dive into the world of generative AI with some prominent papers of Language Model, unlocking the secrets of natural language processing.

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Paper • 1810.04805 • Published Oct 11, 2018 • 14
RoBERTa: A Robustly Optimized BERT Pretraining Approach

Paper • 1907.11692 • Published Jul 26, 2019 • 7
Language Models are Few-Shot Learners

Paper • 2005.14165 • Published May 28, 2020 • 11
OPT: Open Pre-trained Transformer Language Models

Paper • 2205.01068 • Published May 2, 2022 • 2

Lost in the Middle: How Language Models Use Long Contexts

Paper • 2307.03172 • Published Jul 6, 2023 • 36
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Paper • 1810.04805 • Published Oct 11, 2018 • 14
Attention Is All You Need

Paper • 1706.03762 • Published Jun 12, 2017 • 44
Llama 2: Open Foundation and Fine-Tuned Chat Models

Paper • 2307.09288 • Published Jul 18, 2023 • 242

Previous
1
2
3
Next

Company

© Hugging Face

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs