taufiqdp
's Collections
Attention Is All You Need
Paper
•
1706.03762
•
Published
•
49
BERT: Pre-training of Deep Bidirectional Transformers for Language
Understanding
Paper
•
1810.04805
•
Published
•
16
RoBERTa: A Robustly Optimized BERT Pretraining Approach
Paper
•
1907.11692
•
Published
•
7
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and
lighter
Paper
•
1910.01108
•
Published
•
14
Exploring the Limits of Transfer Learning with a Unified Text-to-Text
Transformer
Paper
•
1910.10683
•
Published
•
10
Switch Transformers: Scaling to Trillion Parameter Models with Simple
and Efficient Sparsity
Paper
•
2101.03961
•
Published
•
14
Finetuned Language Models Are Zero-Shot Learners
Paper
•
2109.01652
•
Published
•
2
Multitask Prompted Training Enables Zero-Shot Task Generalization
Paper
•
2110.08207
•
Published
•
2
GLaM: Efficient Scaling of Language Models with Mixture-of-Experts
Paper
•
2112.06905
•
Published
•
1
Scaling Language Models: Methods, Analysis & Insights from Training
Gopher
Paper
•
2112.11446
•
Published
•
1
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Paper
•
2201.11903
•
Published
•
9
LaMDA: Language Models for Dialog Applications
Paper
•
2201.08239
•
Published
•
4
Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A
Large-Scale Generative Language Model
Paper
•
2201.11990
•
Published
•
1
Training language models to follow instructions with human feedback
Paper
•
2203.02155
•
Published
•
16
PaLM: Scaling Language Modeling with Pathways
Paper
•
2204.02311
•
Published
•
2
Training Compute-Optimal Large Language Models
Paper
•
2203.15556
•
Published
•
10
OPT: Open Pre-trained Transformer Language Models
Paper
•
2205.01068
•
Published
•
2
UL2: Unifying Language Learning Paradigms
Paper
•
2205.05131
•
Published
•
5
Language Models are General-Purpose Interfaces
Paper
•
2206.06336
•
Published
•
1
Improving alignment of dialogue agents via targeted human judgements
Paper
•
2209.14375
•
Published
Scaling Instruction-Finetuned Language Models
Paper
•
2210.11416
•
Published
•
7
GLM-130B: An Open Bilingual Pre-trained Model
Paper
•
2210.02414
•
Published
•
3
Holistic Evaluation of Language Models
Paper
•
2211.09110
•
Published
•
1
BLOOM: A 176B-Parameter Open-Access Multilingual Language Model
Paper
•
2211.05100
•
Published
•
27
Galactica: A Large Language Model for Science
Paper
•
2211.09085
•
Published
•
4
OPT-IML: Scaling Language Model Instruction Meta Learning through the
Lens of Generalization
Paper
•
2212.12017
•
Published
•
1
The Flan Collection: Designing Data and Methods for Effective
Instruction Tuning
Paper
•
2301.13688
•
Published
•
8
LLaMA: Open and Efficient Foundation Language Models
Paper
•
2302.13971
•
Published
•
13
PaLM-E: An Embodied Multimodal Language Model
Paper
•
2303.03378
•
Published
Paper
•
2303.08774
•
Published
•
5
Pythia: A Suite for Analyzing Large Language Models Across Training and
Scaling
Paper
•
2304.01373
•
Published
•
9
Paper
•
2305.10403
•
Published
•
6
RWKV: Reinventing RNNs for the Transformer Era
Paper
•
2305.13048
•
Published
•
15
Llama 2: Open Foundation and Fine-Tuned Chat Models
Paper
•
2307.09288
•
Published
•
243
Mamba: Linear-Time Sequence Modeling with Selective State Spaces
Paper
•
2312.00752
•
Published
•
138
Orca: Progressive Learning from Complex Explanation Traces of GPT-4
Paper
•
2306.02707
•
Published
•
46
Textbooks Are All You Need
Paper
•
2306.11644
•
Published
•
142
Textbooks Are All You Need II: phi-1.5 technical report
Paper
•
2309.05463
•
Published
•
87
Paper
•
2310.06825
•
Published
•
47
PaLI-3 Vision Language Models: Smaller, Faster, Stronger
Paper
•
2310.09199
•
Published
•
25
Zephyr: Direct Distillation of LM Alignment
Paper
•
2310.16944
•
Published
•
123
CodeFusion: A Pre-trained Diffusion Model for Code Generation
Paper
•
2310.17680
•
Published
•
70
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents
Paper
•
2311.05437
•
Published
•
48
MEDITRON-70B: Scaling Medical Pretraining for Large Language Models
Paper
•
2311.16079
•
Published
•
20
SeaLLMs -- Large Language Models for Southeast Asia
Paper
•
2312.00738
•
Published
•
23
Kandinsky 3.0 Technical Report
Paper
•
2312.03511
•
Published
•
43
Large Language Models for Mathematicians
Paper
•
2312.04556
•
Published
•
11
FLM-101B: An Open LLM and How to Train It with $100K Budget
Paper
•
2309.03852
•
Published
•
44
Paper
•
2309.03450
•
Published
•
8
Baichuan 2: Open Large-scale Language Models
Paper
•
2309.10305
•
Published
•
19
Paper
•
2309.16609
•
Published
•
35
OpenBA: An Open-sourced 15B Bilingual Asymmetric seq2seq Model
Pre-trained from Scratch
Paper
•
2309.10706
•
Published
•
16
MiniGPT-v2: large language model as a unified interface for
vision-language multi-task learning
Paper
•
2310.09478
•
Published
•
19
Position-Enhanced Visual Instruction Tuning for Multimodal Large
Language Models
Paper
•
2308.13437
•
Published
•
3
InstructionGPT-4: A 200-Instruction Paradigm for Fine-Tuning MiniGPT-4
Paper
•
2308.12067
•
Published
•
4
JudgeLM: Fine-tuned Large Language Models are Scalable Judges
Paper
•
2310.17631
•
Published
•
33
ChatCoder: Chat-based Refine Requirement Improves LLMs' Code Generation
Paper
•
2311.00272
•
Published
•
9
ChipNeMo: Domain-Adapted LLMs for Chip Design
Paper
•
2311.00176
•
Published
•
8
CodeFuse-13B: A Pretrained Multi-lingual Code Large Language Model
Paper
•
2310.06266
•
Published
•
1
Purple Llama CyberSecEval: A Secure Coding Benchmark for Language Models
Paper
•
2312.04724
•
Published
•
20
SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective
Depth Up-Scaling
Paper
•
2312.15166
•
Published
•
56
Generative Multimodal Models are In-Context Learners
Paper
•
2312.13286
•
Published
•
34
Code Llama: Open Foundation Models for Code
Paper
•
2308.12950
•
Published
•
24
Unsupervised Cross-lingual Representation Learning at Scale
Paper
•
1911.02116
•
Published
YAYI 2: Multilingual Open-Source Large Language Models
Paper
•
2312.14862
•
Published
•
13
Mini-GPTs: Efficient Large Language Models through Contextual Pruning
Paper
•
2312.12682
•
Published
•
8
Gemini: A Family of Highly Capable Multimodal Models
Paper
•
2312.11805
•
Published
•
44
LLM360: Towards Fully Transparent Open-Source LLMs
Paper
•
2312.06550
•
Published
•
57
WizardLM: Empowering Large Language Models to Follow Complex
Instructions
Paper
•
2304.12244
•
Published
•
13
The Falcon Series of Open Language Models
Paper
•
2311.16867
•
Published
•
13
Clinical Camel: An Open-Source Expert-Level Medical Language Model with
Dialogue-Based Knowledge Encoding
Paper
•
2305.12031
•
Published
•
5
ChatDoctor: A Medical Chat Model Fine-tuned on LLaMA Model using Medical
Domain Knowledge
Paper
•
2303.14070
•
Published
•
11
LLaVA-Med: Training a Large Language-and-Vision Assistant for
Biomedicine in One Day
Paper
•
2306.00890
•
Published
•
10
BioLORD-2023: Semantic Textual Representations Fusing LLM and Clinical
Knowledge Graph Insights
Paper
•
2311.16075
•
Published
•
6
KBioXLM: A Knowledge-anchored Biomedical Multilingual Pretrained
Language Model
Paper
•
2311.11564
•
Published
•
1
ChiMed-GPT: A Chinese Medical Large Language Model with Full Training
Regime and Better Alignment to Human Preferences
Paper
•
2311.06025
•
Published
•
1
BioT5: Enriching Cross-modal Integration in Biology with Chemical
Knowledge and Natural Language Associations
Paper
•
2310.07276
•
Published
•
5
BIOptimus: Pre-training an Optimal Biomedical Language Model with
Curriculum Learning for Named Entity Recognition
Paper
•
2308.08625
•
Published
•
2
BioCPT: Contrastive Pre-trained Transformers with Large-scale PubMed
Search Logs for Zero-shot Biomedical Information Retrieval
Paper
•
2307.00589
•
Published
•
1
Radiology-GPT: A Large Language Model for Radiology
Paper
•
2306.08666
•
Published
•
1
BiomedGPT: A Unified and Generalist Biomedical Generative Pre-trained
Transformer for Vision, Language, and Multimodal Tasks
Paper
•
2305.17100
•
Published
•
2
Dr. LLaMA: Improving Small Language Models in Domain-Specific QA via
Generative Data Augmentation
Paper
•
2305.07804
•
Published
•
2
Llemma: An Open Language Model For Mathematics
Paper
•
2310.10631
•
Published
•
50
BTLM-3B-8K: 7B Parameter Performance in a 3B Parameter Model
Paper
•
2309.11568
•
Published
•
10
Skywork: A More Open Bilingual Foundation Model
Paper
•
2310.19341
•
Published
•
5
SkyMath: Technical Report
Paper
•
2310.16713
•
Published
•
2
MetaMath: Bootstrap Your Own Mathematical Questions for Large Language
Models
Paper
•
2309.12284
•
Published
•
18
UT5: Pretraining Non autoregressive T5 with unrolled denoising
Paper
•
2311.08552
•
Published
•
7
G-LLaVA: Solving Geometric Problem with Multi-Modal Large Language Model
Paper
•
2312.11370
•
Published
•
20
Language Is Not All You Need: Aligning Perception with Language Models
Paper
•
2302.14045
•
Published
PanGu-Σ: Towards Trillion Parameter Language Model with Sparse
Heterogeneous Computing
Paper
•
2303.10845
•
Published
BloombergGPT: A Large Language Model for Finance
Paper
•
2303.17564
•
Published
•
21
PMC-LLaMA: Towards Building Open-source Language Models for Medicine
Paper
•
2304.14454
•
Published
StarCoder: may the source be with you!
Paper
•
2305.06161
•
Published
•
29
OctoPack: Instruction Tuning Code Large Language Models
Paper
•
2308.07124
•
Published
•
28
TinyGPT-V: Efficient Multimodal Large Language Model via Small Backbones
Paper
•
2312.16862
•
Published
•
30
GeoGalactica: A Scientific Large Language Model in Geoscience
Paper
•
2401.00434
•
Published
•
7
TinyLlama: An Open-Source Small Language Model
Paper
•
2401.02385
•
Published
•
89
DeepSeek LLM: Scaling Open-Source Language Models with Longtermism
Paper
•
2401.02954
•
Published
•
41
Paper
•
2401.04088
•
Published
•
158
MoE-Mamba: Efficient Selective State Space Models with Mixture of
Experts
Paper
•
2401.04081
•
Published
•
70
DeepSeekMoE: Towards Ultimate Expert Specialization in
Mixture-of-Experts Language Models
Paper
•
2401.06066
•
Published
•
43
WizardCoder: Empowering Code Large Language Models with Evol-Instruct
Paper
•
2306.08568
•
Published
•
28
ChatQA: Building GPT-4 Level Conversational QA Models
Paper
•
2401.10225
•
Published
•
34
Orion-14B: Open-source Multilingual Large Language Models
Paper
•
2401.12246
•
Published
•
12
DeepSeek-Coder: When the Large Language Model Meets Programming -- The
Rise of Code Intelligence
Paper
•
2401.14196
•
Published
•
47
Weaver: Foundation Models for Creative Writing
Paper
•
2401.17268
•
Published
•
43
H2O-Danube-1.8B Technical Report
Paper
•
2401.16818
•
Published
•
17
OLMo: Accelerating the Science of Language Models
Paper
•
2402.00838
•
Published
•
82
GPT-NeoX-20B: An Open-Source Autoregressive Language Model
Paper
•
2204.06745
•
Published
•
1
CroissantLLM: A Truly Bilingual French-English Language Model
Paper
•
2402.00786
•
Published
•
25
MobiLlama: Towards Accurate and Lightweight Fully Transparent GPT
Paper
•
2402.16840
•
Published
•
23
MobileLLM: Optimizing Sub-billion Parameter Language Models for
On-Device Use Cases
Paper
•
2402.14905
•
Published
•
126
Nemotron-4 15B Technical Report
Paper
•
2402.16819
•
Published
•
42
StarCoder 2 and The Stack v2: The Next Generation
Paper
•
2402.19173
•
Published
•
136
Gemma: Open Models Based on Gemini Research and Technology
Paper
•
2403.08295
•
Published
•
47
Gemini 1.5: Unlocking multimodal understanding across millions of tokens
of context
Paper
•
2403.05530
•
Published
•
61
Sailor: Open Language Models for South-East Asia
Paper
•
2404.03608
•
Published
•
20
OpenELM: An Efficient Language Model Family with Open-source Training
and Inference Framework
Paper
•
2404.14619
•
Published
•
126
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your
Phone
Paper
•
2404.14219
•
Published
•
253
Eagle and Finch: RWKV with Matrix-Valued States and Dynamic Recurrence
Paper
•
2404.05892
•
Published
•
32
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts
Language Model
Paper
•
2405.04434
•
Published
•
14
DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code
Intelligence
Paper
•
2406.11931
•
Published
•
57
Aya 23: Open Weight Releases to Further Multilingual Progress
Paper
•
2405.15032
•
Published
•
27
Skywork-MoE: A Deep Dive into Training Techniques for Mixture-of-Experts
Language Models
Paper
•
2406.06563
•
Published
•
17
Instruction Pre-Training: Language Models are Supervised Multitask
Learners
Paper
•
2406.14491
•
Published
•
86
The Llama 3 Herd of Models
Paper
•
2407.21783
•
Published
•
110