Models
Datasets
Spaces
Posts
Docs
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2303.03915

The BigScience ROOTS Corpus: A 1.6TB Composite Multilingual Dataset

Paper • 2303.03915 • Published Mar 7, 2023 • 6
cognitivecomputations/dolphin

Viewer • Updated Dec 18, 2023 • 3.73M • 380 • 399

The BigScience ROOTS Corpus: A 1.6TB Composite Multilingual Dataset

Paper • 2303.03915 • Published Mar 7, 2023 • 6

HooshvareLab/gpt2-fa-comment

Text Generation • Updated May 21, 2021 • 31 • 2
HooshvareLab/gpt2-fa-poetry

Text Generation • Updated May 21, 2021 • 32
HooshvareLab/gpt2-fa

Text Generation • Updated May 21, 2021 • 1.79k • 14
bolbolzaban/gpt2-persian

Text Generation • Updated May 21, 2021 • 584 • 23

Dataset pruning/cleaning/dedup

AlpaGasus: Training A Better Alpaca with Fewer Data

Paper • 2307.08701 • Published Jul 17, 2023 • 22
The BigScience ROOTS Corpus: A 1.6TB Composite Multilingual Dataset

Paper • 2303.03915 • Published Mar 7, 2023 • 6
MADLAD-400: A Multilingual And Document-Level Large Audited Dataset

Paper • 2309.04662 • Published Sep 9, 2023 • 22
SlimPajama-DC: Understanding Data Combinations for LLM Training

Paper • 2309.10818 • Published Sep 19, 2023 • 10

Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks

Paper • 2204.07705 • Published Apr 16, 2022 • 1
Knowledge-Driven CoT: Exploring Faithful Reasoning in LLMs for Knowledge-intensive Question Answering

Paper • 2308.13259 • Published Aug 25, 2023 • 2
MAmmoTH: Building Math Generalist Models through Hybrid Instruction Tuning

Paper • 2309.05653 • Published Sep 11, 2023 • 10
MetaMath: Bootstrap Your Own Mathematical Questions for Large Language Models

Paper • 2309.12284 • Published Sep 21, 2023 • 18

Dissecting In-Context Learning of Translations in GPTs

Paper • 2310.15987 • Published Oct 24, 2023 • 5
Monolingual or Multilingual Instruction Tuning: Which Makes a Better Alpaca

Paper • 2309.08958 • Published Sep 16, 2023 • 2
X-LLM: Bootstrapping Advanced Large Language Models by Treating Multi-Modalities as Foreign Languages

Paper • 2305.04160 • Published May 7, 2023 • 2
Ziya-VL: Bilingual Large Vision-Language Model via Multi-Task Instruction Tuning

Paper • 2310.08166 • Published Oct 12, 2023 • 1

Company

© Hugging Face

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs