Factuality-Alignment-Qwen2.5-14B

A factuality-aligned Large Language Model fine-tuned using Factuality-Aware Direct Preference Optimization (Factual-DPO) to reduce hallucinations while preserving preference alignment.

Website: Project Page | Paper: arXiv | Dataset: Hugging Face | Code: Github

🧭 Background & Motivation

Large Language Models optimized via preference learning (e.g., DPO, RLHF) often over-prefer fluent but hallucinated responses, especially when factual correctness is not explicitly supervised.

Factuality-Alignment-Qwen2.5-14B addresses this limitation by applying Factual-DPO, a factuality-aware extension of Direct Preference Optimization that:

Integrates explicit binary factuality supervision
Penalizes preferences that favor hallucinated responses
Introduces margin-based factual penalties (Δ) for controllable hallucination suppression

This model is fine-tuned from Qwen2.5-14B-Instruct using a large-scale, balanced, and synthetic factuality-aware preference dataset derived from Skywork Reward-Preference-80K.

🧠 What Is Factual-DPO?

Standard DPO optimizes preference alignment without distinguishing whether the preferred response is factual.

Factual-DPO modifies the DPO objective by introducing factuality indicators:

Each preference pair includes factuality labels (h_w, h_l)
A margin penalty Δ is applied when the preferred response is less factual
Optimization pressure shifts toward factually correct preferences

➡️ Result:
Lower hallucination rates without sacrificing preference win-rate or fluency.

✨ Key Contributions

🔍 Binary factuality supervision integrated into preference learning
🧪 Synthetic hallucination inversion to balance factual vs hallucinated pairs
📐 Δ-margin factual penalties for controllable hallucination suppression
⚙️ Config-driven, reproducible training and evaluation pipelines
📊 Multi-model × multi-Δ benchmarking at scale

🧪 Training Overview

Base model: Qwen2.5-14B-Instruct
Training method: Factuality-Aware DPO (QLoRA, 4-bit NF4)
Frameworks: TRL, Unsloth, Accelerate
Hardware: A100 / A40 GPUs
Objective: Reduce hallucinations while maintaining preference alignment

Each Δ value produces a separate fine-tuned checkpoint, enabling controlled factuality–preference trade-offs.

📊 Evaluation

Evaluation is performed using GPT-4o-mini as an LLM-as-a-Judge.

Metrics

Metric	Description
factuality	Mean factual score
halluc_rate	% outputs below factual threshold
win_rate	Preference win-rate vs baseline
count	Number of evaluated prompts

The Factual-DPO variants consistently show:

↓ hallucination rate
↑ factuality score
Comparable or improved preference win-rate

🚀 Usage Example

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_id = "vector-institute/Factuality-Alignment-Qwen2.5-14B"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto"
)

prompt = "What are the causes of Type 1 diabetes?"

inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_new_tokens=256,
        temperature=0.7,
        do_sample=True
    )

print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Citation: If you use this model please cite us

@article{FactualAlignment2026,
  title={Reducing Hallucinations in LLMs via Factuality-Aware Preference Learning},
  author={Sindhuja Chaduvula, Ahmed Radwan, Azib Farooq, Yani Ioannou, Shaina Raza},
  journal={arXiv preprint arXiv:2601.03027},
  year={2026}
}

Downloads last month: 33

Model tree for vector-institute/Factuality-Alignment-Qwen2.5-14B

Base model

Qwen/Qwen2.5-14B

Adapter

(6)

this model