Factuality-Alignment-Qwen2.5-14B

A factuality-aligned Large Language Model fine-tuned using Factuality-Aware Direct Preference Optimization (Factual-DPO) to reduce hallucinations while preserving preference alignment.

Website: Project Page  |  Paper: arXiv  |  Dataset: Hugging Face  |  Code: Github


🧭 Background & Motivation

Large Language Models optimized via preference learning (e.g., DPO, RLHF) often over-prefer fluent but hallucinated responses, especially when factual correctness is not explicitly supervised.

Factuality-Alignment-Qwen2.5-14B addresses this limitation by applying Factual-DPO, a factuality-aware extension of Direct Preference Optimization that:

  • Integrates explicit binary factuality supervision
  • Penalizes preferences that favor hallucinated responses
  • Introduces margin-based factual penalties (Ξ”) for controllable hallucination suppression

This model is fine-tuned from Qwen2.5-14B-Instruct using a large-scale, balanced, and synthetic factuality-aware preference dataset derived from Skywork Reward-Preference-80K.


🧠 What Is Factual-DPO?

Standard DPO optimizes preference alignment without distinguishing whether the preferred response is factual.

Factual-DPO modifies the DPO objective by introducing factuality indicators:

  • Each preference pair includes factuality labels (h_w, h_l)
  • A margin penalty Ξ” is applied when the preferred response is less factual
  • Optimization pressure shifts toward factually correct preferences

➑️ Result:
Lower hallucination rates without sacrificing preference win-rate or fluency.


✨ Key Contributions

  • πŸ” Binary factuality supervision integrated into preference learning
  • πŸ§ͺ Synthetic hallucination inversion to balance factual vs hallucinated pairs
  • πŸ“ Ξ”-margin factual penalties for controllable hallucination suppression
  • βš™οΈ Config-driven, reproducible training and evaluation pipelines
  • πŸ“Š Multi-model Γ— multi-Ξ” benchmarking at scale

πŸ§ͺ Training Overview

  • Base model: Qwen2.5-14B-Instruct
  • Training method: Factuality-Aware DPO (QLoRA, 4-bit NF4)
  • Frameworks: TRL, Unsloth, Accelerate
  • Hardware: A100 / A40 GPUs
  • Objective: Reduce hallucinations while maintaining preference alignment

Each Ξ” value produces a separate fine-tuned checkpoint, enabling controlled factuality–preference trade-offs.


πŸ“Š Evaluation

Evaluation is performed using GPT-4o-mini as an LLM-as-a-Judge.

Metrics

Metric Description
factuality Mean factual score
halluc_rate % outputs below factual threshold
win_rate Preference win-rate vs baseline
count Number of evaluated prompts

The Factual-DPO variants consistently show:

  • ↓ hallucination rate
  • ↑ factuality score
  • Comparable or improved preference win-rate

πŸš€ Usage Example

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_id = "vector-institute/Factuality-Alignment-Qwen2.5-14B"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto"
)

prompt = "What are the causes of Type 1 diabetes?"

inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_new_tokens=256,
        temperature=0.7,
        do_sample=True
    )

print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Citation: If you use this model please cite us

@article{FactualAlignment2026,
  title={Reducing Hallucinations in LLMs via Factuality-Aware Preference Learning},
  author={Sindhuja Chaduvula, Ahmed Radwan, Azib Farooq, Yani Ioannou, Shaina Raza},
  journal={arXiv preprint arXiv:2601.03027},
  year={2026}
}
Downloads last month
33
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for vector-institute/Factuality-Alignment-Qwen2.5-14B

Base model

Qwen/Qwen2.5-14B
Adapter
(6)
this model