Data - a CCMat Collection

CCMat 's Collections

RL

LoRA

Visual Consistency

ID Preservation

Inference Improvements

Adapters & Controls

Personalization

Depth & Segmentation

Computer Vision

3D & 360 & World Models

Video

Mixture of Experts

Transformers & Attention

StateSpaceModels

LLMs

Audio

Agents

Data

UI

toread

VLM

Data

updated Jul 4, 2024

Dolma: an Open Corpus of Three Trillion Tokens for Language Model Pretraining Research

Paper • 2402.00159 • Published Jan 31, 2024 • 64
Aria Everyday Activities Dataset

Paper • 2402.13349 • Published Feb 20, 2024 • 32
WildChat: 1M ChatGPT Interaction Logs in the Wild

Paper • 2405.01470 • Published May 2, 2024 • 63
OpenVid-1M: A Large-Scale High-Quality Dataset for Text-to-video Generation

Paper • 2407.02371 • Published Jul 2, 2024 • 55