AI Starter Pack

community
Activity Feed

AI & ML interests

None defined yet.

Recent Activity

ai-starter-pack's activity

nyuuzyou 
posted an update 3 days ago
view post
Post
856
✈ Thanks for the interest shown in the FlightAware Photos dataset ( nyuuzyou/flightaware). Seeing its potential, I'm working on expanding it to over 1 million images soon.

---

🎹 Introducing the PaintBerri Hand-Drawn Art Dataset - nyuuzyou/paintberri

A collection of 68,860 digital hand-drawn artworks featuring:

Unique images sourced directly from the paintberri.com online art community.
Rich metadata including creator-provided titles, descriptions, and timestamps.
Image dimensions, thumbnail URLs, and NSFW content flags.
Creator IDs (where available) and unique short identifiers for each piece.

This dataset offers a distinct visual archive capturing diverse styles and subjects from an active online drawing community, suitable for image classification and image-to-text tasks. Opt-out is available for creators wishing to remove their work.
nyuuzyou 
posted an update 7 days ago
view post
Post
1443
✈ FlightAware Photos Dataset - nyuuzyou/flightaware

Collection of approximately 197,718 aviation photographs featuring:
- High-quality aircraft images across multiple sizes and formats
- Comprehensive metadata including aircraft registrations, types, and photographer information
- View counts, ratings, and submission timestamps for each photo
- Rich classification data preserving original titles, descriptions, and photographer badges

This dataset offers a unique visual archive of aircraft spanning commercial, military, and private aviation captured by FlightAware's community of photographers under CC BY-NC-SA 3.0 license.
nyuuzyou 
posted an update 10 days ago
nyuuzyou 
posted an update 11 days ago
view post
Post
1284
📚 Archive of Our Own (AO3) Dataset - nyuuzyou/archiveofourown

Collection of approximately 12.6 million fanfiction works (from 63.2M processed IDs) featuring:
- Full text content from diverse fandoms across television, film, books, anime, and more
- Comprehensive metadata including warnings, relationships, characters, and tags
- Multilingual content with works in 40+ languages though English predominant
- Rich classification data preserving author-created folksonomy and content categorization

P.S. This is the most expensive dataset I've created so far! And also, thank you all for the 100 followers on Hugging Face!
nyuuzyou 
posted an update 13 days ago
view post
Post
2756
I am planning to release *something big* this week, but in the meantime I was bored, so I quickly made a small dataset in as-is format.

đŸ“± Sponsr.ru Dataset - nyuuzyou/sponsr

Collection of 44,138 posts from Sponsr.ru, a Russian content subscription platform featuring:
- Comprehensive metadata including project details, post information, and pricing
- Detailed content categorization with images, videos, and text formats
- Monolingual Russian content from diverse creator projects
nyuuzyou 
posted an update 28 days ago
view post
Post
2262
🐮 Fimfiction.net Writings Dataset - nyuuzyou/fimfiction

Collection of 815,740+ stories from Fimfiction.net featuring:
- Full story content from diverse fanfiction authors across the platform
- Complete metadata including titles, unique identifiers, and publication details
- Rich structural information preserving story formatting and author notes
- English-language content with diverse writing styles and narrative approaches
  • 1 reply
·
matrixy 
in ai-starter-pack/README about 1 month ago
nyuuzyou 
posted an update about 1 month ago
view post
Post
563
🌐 Public MediaWiki Collection Dataset - nyuuzyou/wikis

Collection of 1.66M+ articles from 930 public MediaWiki instances featuring:

- Full article content from diverse public wikis across the internet
- Complete metadata including templates, categories, and section structure
- Rich structural information preserving wiki organization and links
- Multilingual content across 35+ languages including English, Chinese, Spanish, and more
- Regional language variants including US/UK English, Brazilian Portuguese, and Traditional/Simplified Chinese

Key contents:
- 1,662,448 wiki articles with full text
- Extensive metadata including templates, categories, sections
- Internal wikilinks and external reference information
- Cross-domain knowledge spanning multiple topics and fields
aadilxdev 
in ai-starter-pack/README about 1 month ago
mateitudose 
in ai-starter-pack/README about 1 month ago
lyftium 
in ai-starter-pack/README about 1 month ago
Bhavivai27 
in ai-starter-pack/README about 1 month ago
BananaSauce 
in ai-starter-pack/README about 1 month ago