Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
Omartificial-Intelligence-Space 's Collections
Semantic Arabic Qwen Embeddings
Arabic Matryoshka & GATE Embedding Models
Arabic LLAMA3 & 3.1 FineTuned Models
Saudi Dialect Sentence Embedding Models Collection
AraEuroBERT
DIRA – Diraya Arabic Reasoning AI
Arabic ModernBERT
Huggingface FineWeb2 Arabic Dataset Portions
Arabic Re-Ranking Hub
Arabic NLI & Semantic Similarity Datasets
ArabianLLM Series

Huggingface FineWeb2 Arabic Dataset Portions

updated 1 day ago

Collection of a comprehensive dataset of Arabic text sourced from the FineWeb2 project, representing diverse content across Arabic MSA and Dialect.

Upvote
1

  • HuggingFaceFW/fineweb-2

    Viewer • Updated Jan 8 • 12.5B • 46.5k • 489

    Note This is the Original Repo for FineWeb2 include 1000s languages. Fine the Arabic Portions below


  • Omartificial-Intelligence-Space/FineWeb2-MSA

    Viewer • Updated Dec 15, 2024 • 907M • 632 • 1

  • Omartificial-Intelligence-Space/FineWeb2-Egyptian-Arabic

    Viewer • Updated Dec 12, 2024 • 23.9M • 67 • 2

  • Omartificial-Intelligence-Space/FineWeb2-Moroccan-Arabic

    Viewer • Updated Dec 12, 2024 • 69.6M • 222 • 1

  • Omartificial-Intelligence-Space/FineWeb2-North-Levantine-Arabic

    Viewer • Updated Dec 12, 2024 • 223k • 22 • 1

  • Omartificial-Intelligence-Space/FineWeb2-Najdi-Arabic

    Viewer • Updated Dec 12, 2024 • 48.4M • 62 • 1
Upvote
1
  • Collection guide
  • Browse collections
Company
TOS Privacy About Jobs
Website
Models Datasets Spaces Pricing Docs