Post
1320
Would 1-2 sentence tl;dr summaries of datasets on the Hub be useful for you?
For example, for the togethercomputer/RedPajama-Data-1T dataset, would the following summary help give you a quick sense of its content?
> tl;dr: RedPajama is a fully open-source implementation of the LLaMa dataset, consisting of 1.2 trillion tokens from sources like Commoncrawl, C4, GitHub, Books, ArXiv, Wikipedia, and StackExchange, primarily in English, and is structured with metadata for each text sample.
I've created a dataset with example summaries of the 500 most liked datasets on the Hub: davanstrien/dataset-tldr
Would these kinds of summaries be helpful?
For example, for the togethercomputer/RedPajama-Data-1T dataset, would the following summary help give you a quick sense of its content?
> tl;dr: RedPajama is a fully open-source implementation of the LLaMa dataset, consisting of 1.2 trillion tokens from sources like Commoncrawl, C4, GitHub, Books, ArXiv, Wikipedia, and StackExchange, primarily in English, and is structured with metadata for each text sample.
I've created a dataset with example summaries of the 500 most liked datasets on the Hub: davanstrien/dataset-tldr
Would these kinds of summaries be helpful?