Pietro Lesci
pietrolesci
AI & ML interests
I like developing and applying causal methods to study the effect of training choices on models’ behaviour, including memorisation, shortcut learning, and tokenisation.
Recent Activity
updated
a model
5 days ago
pietrolesci/small_unigramlm128k
updated
a collection
5 days ago
UnimixLM
published
a model
5 days ago
pietrolesci/small_unigramlm128k
Organizations
The Pile Companion
Machine Translation Datasets
A curated collection of machine translation datasets
Dialogue State Tracking Datasets
A curated collection of datasets used in Dialogue State Tracking research
AnchorAL
Artefacts for the paper "AnchorAL: Computationally Efficient Active Learning for Large and Imbalanced Datasets" (Lesci and Vlachos, 2024)
-
AnchorAL: Computationally Efficient Active Learning for Large and Imbalanced Datasets
Paper • 2404.05623 • Published • 3 -
pietrolesci/anchoral-paper-artefacts
Viewer • Updated • 2.78M • 55 -
pietrolesci/amazoncat-13k
Viewer • Updated • 5.99M • 120 • 1 -
pietrolesci/wikitoxic
Viewer • Updated • 894k • 108 • 1
Tokenisation-Bias
Interesting Pre-Training Datasets
Generalisation-Profiles
Text Classification Datasets
A curated collection of common datasets for text classification
NLI Eval Datasets
A curated collection of NLI evaluation datasets. Each dataset is exactly as originally proposed
Memorisation-Profiles
Artefacts for the paper "Causal Estimation of Memorisation Profiles" (Lesci et al., 2024)
UnimixLM
Interesting Pre-Training Datasets
The Pile Companion
Generalisation-Profiles
Machine Translation Datasets
A curated collection of machine translation datasets
Text Classification Datasets
A curated collection of common datasets for text classification
Dialogue State Tracking Datasets
A curated collection of datasets used in Dialogue State Tracking research
NLI Eval Datasets
A curated collection of NLI evaluation datasets. Each dataset is exactly as originally proposed
AnchorAL
Artefacts for the paper "AnchorAL: Computationally Efficient Active Learning for Large and Imbalanced Datasets" (Lesci and Vlachos, 2024)
-
AnchorAL: Computationally Efficient Active Learning for Large and Imbalanced Datasets
Paper • 2404.05623 • Published • 3 -
pietrolesci/anchoral-paper-artefacts
Viewer • Updated • 2.78M • 55 -
pietrolesci/amazoncat-13k
Viewer • Updated • 5.99M • 120 • 1 -
pietrolesci/wikitoxic
Viewer • Updated • 894k • 108 • 1
Memorisation-Profiles
Artefacts for the paper "Causal Estimation of Memorisation Profiles" (Lesci et al., 2024)
Tokenisation-Bias