Mantas Mazeika's picture

2

Mantas Mazeika PRO

mmazeika

·

mmazeika

AI & ML interests

None yet

Recent Activity

updated a model 20 days ago

mmazeika/emergent-values-data

published a model 20 days ago

mmazeika/emergent-values-data

authored a paper about 1 month ago

Humanity's Last Exam

View all activity

Organizations

mmazeika's activity

updated a model 20 days ago

mmazeika/emergent-values-data

Updated 20 days ago

published a model 20 days ago

mmazeika/emergent-values-data

Updated 20 days ago

authored a paper about 1 month ago

Humanity's Last Exam

Paper • 2501.14249 • Published Jan 24 • 65

authored 7 papers about 1 year ago

Measuring Coding Challenge Competence With APPS

Paper • 2105.09938 • Published May 20, 2021 • 1

Representation Engineering: A Top-Down Approach to AI Transparency

Paper • 2310.01405 • Published Oct 2, 2023 • 5

Deep Anomaly Detection with Outlier Exposure

Paper • 1812.04606 • Published Dec 11, 2018

Forecasting Future World Events with Neural Networks

Paper • 2206.15474 • Published Jun 30, 2022 • 1

HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal

Paper • 2402.04249 • Published Feb 6, 2024 • 4

Using Self-Supervised Learning Can Improve Model Robustness and Uncertainty

Paper • 1906.12340 • Published Jun 28, 2019

Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

Paper • 2206.04615 • Published Jun 9, 2022 • 5

authored 2 papers over 1 year ago

An Overview of Catastrophic AI Risks

Paper • 2306.12001 • Published Jun 21, 2023

DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT Models

Paper • 2306.11698 • Published Jun 20, 2023 • 12

authored a paper almost 2 years ago

Measuring Massive Multitask Language Understanding

Paper • 2009.03300 • Published Sep 7, 2020 • 3