Papers
arxiv:2601.18753

HalluGuard: Demystifying Data-Driven and Reasoning-Driven Hallucinations in LLMs

Published on Jan 26
Β· Submitted by
Xinyue Zeng
on Jan 27
Authors:
,
,
,

Abstract

A theoretical framework and detection method for identifying hallucinations in large language models by analyzing data-driven and reasoning-driven components through neural tangent kernel-based scoring.

AI-generated summary

The reliability of Large Language Models (LLMs) in high-stakes domains such as healthcare, law, and scientific discovery is often compromised by hallucinations. These failures typically stem from two sources: data-driven hallucinations and reasoning-driven hallucinations. However, existing detection methods usually address only one source and rely on task-specific heuristics, limiting their generalization to complex scenarios. To overcome these limitations, we introduce the Hallucination Risk Bound, a unified theoretical framework that formally decomposes hallucination risk into data-driven and reasoning-driven components, linked respectively to training-time mismatches and inference-time instabilities. This provides a principled foundation for analyzing how hallucinations emerge and evolve. Building on this foundation, we introduce HalluGuard, an NTK-based score that leverages the induced geometry and captured representations of the NTK to jointly identify data-driven and reasoning-driven hallucinations. We evaluate HalluGuard on 10 diverse benchmarks, 11 competitive baselines, and 9 popular LLM backbones, consistently achieving state-of-the-art performance in detecting diverse forms of LLM hallucinations.

Community

Paper author Paper submitter
β€’
edited about 15 hours ago

πŸš€ HalluGuard: Demystifying Data-Driven and Reasoning-Driven Hallucinations in LLMs
Accepted at ICLR 2026

In this work, we introduce HalluGuard, a unified, theory-driven framework for hallucination detection in large language models, accepted at ICLR 2026.
Rather than treating hallucination as a single failure mode, HalluGuard explicitly decomposes hallucinations into data-driven and reasoning-driven componentsβ€”and detects both at inference time, with no retraining, no labels, and no external references.


πŸ˜† Key Takeaways


🧠 Two Sources of Hallucination

LLM hallucinations arise from two fundamentally different mechanisms:

  • Data-driven hallucinations
    Errors rooted in biased, incomplete, or mismatched knowledge acquired during pretraining or finetuning.

  • Reasoning-driven hallucinations
    Errors caused by instability and error amplification during multi-step autoregressive decoding.

Most existing detectors focus on only one of these. HalluGuard shows that real hallucinations often emerge from their interaction and evolve across decoding steps.


πŸ“ Hallucination Risk Bound (Theory)

We introduce a Hallucination Risk Bound, which formally decomposes total hallucination risk into:

  • a representation bias term (training-time mismatch), and
  • a decoding instability term (inference-time amplification).

The analysis reveals a key insight:
hallucinations originate from semantic approximation gaps and are then exponentially amplified during long-horizon generation.

This provides a principled explanation of how hallucinations emerge and evolve in LLMs.


πŸ” HalluGuard Score (Method)

Building on this theory, we propose HalluGuard, a lightweight NTK-based hallucination score:

HALLUGUARD(uh)=det⁑(K)+log⁑σmaxβ‘βˆ’log⁑ ⁣(ΞΊ(K)2). \mathrm{HALLUGUARD}(u_h) = \det(K) + \log \sigma_{\max} - \log\!\big(\kappa(K)^2\big).

Higher HalluGuard score β‡’ lower hallucination risk.


πŸ“Š Strong Empirical Results

We evaluate HalluGuard across:

  • 10 benchmarks (QA, math reasoning, instruction following),
  • 11 competitive baselines, and
  • 9 LLM backbones (from GPT-2 to 70B-scale models).

Results:

  • πŸ† Consistent state-of-the-art AUROC / AUPRC across all task families
  • πŸ” Especially strong gains on multi-step reasoning benchmarks (MATH-500, BBH)
  • 🧩 Robust detection of fine-grained semantic hallucinations (PAWS), even when surface forms are nearly identical

🧭 Beyond Detection: Test-Time Guidance

HalluGuard can also be used to guide test-time inference, significantly improving reasoning accuracy by steering generation away from unstable trajectoriesβ€”without modifying or retraining the model.


πŸ”‘ Takeaway

HalluGuard (ICLR 2026) provides:

  • a theoretical lens for understanding how hallucinations emerge and evolve, and
  • a practical, plug-and-play detector for modern LLMs.

It bridges representation geometry and decoding dynamics, offering a unified foundation for reliable reasoning and uncertainty-aware inference.

Feedback and discussion are very welcome πŸ™Œ

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

arXivlens breakdown of this paper πŸ‘‰ https://arxivlens.com/PaperView/Details/halluguard-demystifying-data-driven-and-reasoning-driven-hallucinations-in-llms-688-50234d67

  • Executive Summary
  • Detailed Breakdown
  • Practical Applications

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2601.18753 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2601.18753 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2601.18753 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.