LMEnt: A Suite for Analyzing Knowledge in Language Models from Pretraining Data to Representations Paper • 2509.03405 • Published 4 days ago • 17
Mapping Toxic Comments Across Demographics: A Dataset from German Public Broadcasting Paper • 2508.21084 • Published 12 days ago • 1
KL3M Tokenizers: A Family of Domain-Specific and Character-Level Tokenizers for Legal, Financial, and Preprocessing Applications Paper • 2503.17247 • Published Mar 21 • 1 • 2
KL3M Tokenizers: A Family of Domain-Specific and Character-Level Tokenizers for Legal, Financial, and Preprocessing Applications Paper • 2503.17247 • Published Mar 21 • 1
German4All - A Dataset and Model for Readability-Controlled Paraphrasing in German Paper • 2508.17973 • Published 13 days ago • 1 • 5
German4All - A Dataset and Model for Readability-Controlled Paraphrasing in German Paper • 2508.17973 • Published 13 days ago • 1 • 5
German4All - A Dataset and Model for Readability-Controlled Paraphrasing in German Paper • 2508.17973 • Published 13 days ago • 1
German4All - A Dataset and Model for Readability-Controlled Paraphrasing in German Paper • 2508.17973 • Published 13 days ago • 1 • 5