Patronus AI

Team

company

Verified

https://patronus.ai

patronusai

Activity Feed Request to join this org

AI & ML interests

LLM Evaluation

Recent Activity

DarshanDeshpande published a dataset about 3 hours ago

PatronusAI/world_model_corpus

DarshanDeshpande updated a dataset about 3 hours ago

PatronusAI/world_model_corpus

akkikiki authored a paper about 1 month ago

Diable: Efficient Dialogue State Tracking as Operations on Tables

View all activity

Papers

Benchmarking Reward Hack Detection in Code Environments via Contrastive Analysis

MEMTRACK: Evaluating Long-Term Memory and State Tracking in Multi-Platform Dynamic Agent Environments

View all Papers

DarshanDeshpande

published a dataset about 3 hours ago

PatronusAI/world_model_corpus

Viewer • Updated about 3 hours ago • 284k • 43

DarshanDeshpande

updated a dataset about 3 hours ago

PatronusAI/world_model_corpus

Viewer • Updated about 3 hours ago • 284k • 43

akkikiki

authored 2 papers about 1 month ago

Diable: Efficient Dialogue State Tracking as Operations on Tables

Paper • 2305.17020 • Published May 26, 2023

Unlocking Prompt Infilling Capability for Diffusion Language Models

Paper • 2604.03677 • Published Apr 4

DarshanDeshpande

submitted a paper to Daily Papers 4 months ago

Benchmarking Reward Hack Detection in Code Environments via Contrastive Analysis

Paper • 2601.20103 • Published Jan 27 • 1

akkikiki

authored a paper 7 months ago

Contrastive Decoding Mitigates Score Range Bias in LLM-as-a-Judge

Paper • 2510.18196 • Published Oct 21, 2025

DarshanDeshpande

authored a paper 7 months ago

MEMTRACK: Evaluating Long-Term Memory and State Tracking in Multi-Platform Dynamic Agent Environments

Paper • 2510.01353 • Published Oct 1, 2025 • 3

DarshanDeshpande

authored a paper about 1 year ago

TRAIL: Trace Reasoning and Agentic Issue Localization

Paper • 2505.08638 • Published May 13, 2025 • 6

anandnk24

authored 5 papers about 1 year ago

Lynx: An Open Source Hallucination Evaluation Model

Paper • 2407.08488 • Published Jul 11, 2024

SimpleSafetyTests: a Test Suite for Identifying Critical Safety Risks in Large Language Models

Paper • 2311.08370 • Published Nov 14, 2023

FinanceBench: A New Benchmark for Financial Question Answering

Paper • 2311.11944 • Published Nov 20, 2023

GLIDER: Grading LLM Interactions and Decisions using Explainable Ranking

Paper • 2412.14140 • Published Dec 18, 2024 • 1

Browsing Lost Unformed Recollections: A Benchmark for Tip-of-the-Tongue Search and Reasoning

Paper • 2503.19193 • Published Mar 24, 2025 • 1

DarshanDeshpande

authored a paper about 1 year ago

Browsing Lost Unformed Recollections: A Benchmark for Tip-of-the-Tongue Search and Reasoning

Paper • 2503.19193 • Published Mar 24, 2025 • 1

vgtomahawk

authored a paper over 1 year ago

Humanity's Last Exam

Paper • 2501.14249 • Published Jan 24, 2025 • 77

DarshanDeshpande

authored a paper over 1 year ago

GLIDER: Grading LLM Interactions and Decisions using Explainable Ranking

Paper • 2412.14140 • Published Dec 18, 2024 • 1

DarshanDeshpande

updated 2 models over 1 year ago

PatronusAI/glider-gguf

4B • Updated Dec 19, 2024 • 29 • 3

PatronusAI/glider

Text Generation • 4B • Updated Jan 2, 2025 • 1.15k • 44

allenpark

updated a Space over 1 year ago

GLIDER

🦅

GLIDER: Grading LLM Interactions and Decisions using Explain

DarshanDeshpande

updated a dataset over 1 year ago

PatronusAI/glider-feedback-bench-suite

Viewer • Updated Dec 18, 2024 • 1k • 16 • 1

AI & ML interests

Recent Activity

Papers

Team members 18

PatronusAI's activity

GLIDER