arxiv:2606.18021

LegalHalluLens: Typed Hallucination Auditing and Calibrated Multi-Agent Debate for Trustworthy Legal AI

Published on Jun 16

· Submitted by

Lalit Yadav on Jun 19

Independent Research

Upvote

Authors:

Lalit Yadav ,

Abstract

LegalHalluLens audits AI systems in legal workflows by identifying specific error patterns and directional biases in hallucinations across different claim types, enabling more reliable deployment through targeted diagnostic and mitigation approaches.

Generated by Qwen/Qwen2.5-Coder-32B-Instruct

AI systems deployed in legal workflows hallucinate at rates that aggregate metrics report at ~52%, but this average conceals where errors concentrate and in which direction they run, leaving compliance officers without an actionable signal for trustworthy deployment. We present LegalHalluLens, an auditing framework with three components: typed hallucination profiles across four legally-motivated claim categories (numeric, temporal, obligation/entitlement, factual) over CUAD (Hendrycks et al., 2021); a Risk Direction Index (RDI) that reduces omission-versus-invention bias to a single deployment-comparable scalar; and a typed debate pipeline calibrated to both magnitudes and directions. Across 510 contracts and 249,252 clause-level instances we measure a within-model gap of approximately 38-40 pp between obligation/numeric and temporal claims that aggregate reporting hides, and show that two systems with matched 52% rates can carry opposite RDIs. The debate pipeline reduces fabricated detections by 45% with per-category gains tracking the diagnosis, matching commercial APIs with a substantially smaller backbone (4B active parameters). Typed profiles and RDI surface failure modes that aggregate metrics hide; we further show these diagnostics serve as calibration inputs for multi-agent debate pipelines, where Skeptic challenges and asymmetric gates targeted at measured failure modes outperform generically-tuned debate. The framework supports direction-aware procurement, accountability, and agent design for legal AI deployed in the wild.

View arXiv page View PDF GitHub 1 Add to collection

Community

lalitdv9

Paper author Paper submitter about 13 hours ago

•

edited about 13 hours ago

Hi Hugging Face community!

My co-author and I are excited to share LegalHalluLens, which was recently accepted at the ICML 2026 AIWILD workshop. We built this framework to address a massive blind spot in current LLM benchmarking: aggregate accuracy scores completely mask catastrophic domain-specific failures.

🔍What we did:

Massive Audit: We evaluated 249,252 clause-level instances across commercial and open-source models to map exact failure profiles on high-liability text.
The "Average" Lie: We proved that a blended 52% error rate hides a massive 40-point gap. Models excel at easy questions (dates/terms, ~29% error) but fail catastrophically on high-liability clauses (liability caps/indemnities, 65%-74% error).
The Risk Direction Index (RDI): We introduced a directional metric to quantify whether a model is an "Omitter" (silently dropping rules) or an "Inventor" (hallucinating fake rules).

The Open-Source Fix:

Instead of relying on massive closed-source APIs, we used these empirical failure profiles to calibrate an asymmetric 6-role multi-agent debate pipeline.

By forcing the agents through targeted safety gates, we enabled a lightweight 4B active parameter model (Gemma) to cut fabrications by 45%—effectively matching the composite performance of commercial frontier APIs while drastically lowering inference costs.

Everything—the dataset processing scripts, the RDI evaluation suite, and the calibrated multi-agent pipeline—is fully open-source.

We would love to hear the community's thoughts on using directional metrics for agent alignment and routing!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Get this paper in your agent:

hf papers read 2606.18021

Don't have the latest CLI?

curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2606.18021 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2606.18021 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2606.18021 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.