AI & ML interests
None defined yet.
Recent Activity
A suite of recall-intensive tasks for evaluating sub-quadratic models
Linearizing LLMs with high quality and efficiency. We linearize the full Llama 3.1 model family -- 8b, 70b, 405b -- for the first time!
Models and Datasets for M2-BERT and LoCoV1
The models and datasets for Weaver: Shrinking the Generation-Verification Gap with Weak Verifiers
-
hazyresearch/Weaver_Distilled_ModernBERT_Large_for_MMLU-Pro
Text Classification • Updated • 13 -
hazyresearch/Weaver_Distilled_ModernBERT_Large_for_GPQA
Text Classification • Updated • 5 -
hazyresearch/Weaver_Distilled_ModernBERT_Large_for_MATH500
Text Classification • Updated -
hazyresearch/MATH500_with_Llama_3.1_70B_Instruct_v1
Viewer • Updated • 500 • 28
These language model checkpoints are trained at the 360M and 1.3Bn parameter scales for up to 50Bn tokens on the Pile corpus, for research purposes.
Here we provide models and benchmarks for the Just Read Twice work: https://arxiv.org/abs/2407.05483
The models and datasets for Weaver: Shrinking the Generation-Verification Gap with Weak Verifiers
-
hazyresearch/Weaver_Distilled_ModernBERT_Large_for_MMLU-Pro
Text Classification • Updated • 13 -
hazyresearch/Weaver_Distilled_ModernBERT_Large_for_GPQA
Text Classification • Updated • 5 -
hazyresearch/Weaver_Distilled_ModernBERT_Large_for_MATH500
Text Classification • Updated -
hazyresearch/MATH500_with_Llama_3.1_70B_Instruct_v1
Viewer • Updated • 500 • 28
A suite of recall-intensive tasks for evaluating sub-quadratic models
Linearizing LLMs with high quality and efficiency. We linearize the full Llama 3.1 model family -- 8b, 70b, 405b -- for the first time!
These language model checkpoints are trained at the 360M and 1.3Bn parameter scales for up to 50Bn tokens on the Pile corpus, for research purposes.
Models and Datasets for M2-BERT and LoCoV1
Here we provide models and benchmarks for the Just Read Twice work: https://arxiv.org/abs/2407.05483