QEIL v2: Heterogeneous Computing for Edge Intelligence via Roofline-Derived Pareto-Optimal Energy Modeling and Multi-Objective Orchestration
Abstract
QEIL v2 improves energy efficiency and performance of large language model inference on edge devices through physics-based adaptive optimization and workload-aware resource allocation.
Deploying large language models (LLMs) on heterogeneous edge devices demands frameworks that jointly optimize energy efficiency, inference quality, and reliability. Our prior QEIL v1 (Kumar & Jha, 2026) achieved 4.82x IPW improvement but relied on static efficiency factors, greedy optimization, and unverified candidate selection. QEIL v2 replaces every static heuristic with physics-grounded, runtime-adaptive models. We introduce three device-workload metrics: DASI (roofline-derived compute utilization), CPQ (memory pressure from allocation theory), and Phi (thermal yield from CMOS leakage physics), forming a unified energy equation with every coefficient traceable to semiconductor physics. For optimization, PGSAM (Pareto-Guided Simulated Annealing with Momentum) simultaneously minimizes energy, latency, and device underutilization. At inference time, the EAC/ARDE selection cascade with CSVET early stopping provides progressive verification among repeated samples. Evaluated on WikiText-103, GSM8K, and ARC-Challenge across seven model families (125M-8B parameters, including one pre-quantized variant), QEIL v2 achieves 75.7% pass@k at 63.8W (IPW=0.9749), a 2.86x improvement over standard inference. When applied to a 4-bit Llama-3.1-8B, QEIL v2's physics-grounded routing achieves IPW=1.024 at 54.8W -- the first edge orchestration system to surpass the IPW=1.0 empirical reference mark, with the gain attributable entirely to QEIL v2's workload-adaptive device allocation on a model with reduced memory bandwidth requirements. Total energy drops 75.6% vs. standard with 38.3% latency reduction, zero thermal throttling, and 100% fault recovery across all benchmarks and model families.
Community
Excited to share QEIL v2: Roofline-Derived Pareto-Optimal Edge Intelligence via First-Principles Energy Modeling and Multi-Objective Orchestration.
This work replaces static heuristics with physics-grounded runtime modeling using DASI, CPQ, and Φ, then combines them with PGSAM for Pareto-optimal device allocation and an EAC/ARDE verification cascade for higher-quality inference. On our heterogeneous edge platform, QEIL v2 achieves 75.7% pass@k at 63.8W (IPW=0.9749), 75.6% lower energy than standard inference, and zero thermal throttling across benchmarks.
We believe this provides a practical path toward reliable, high-efficiency LLM deployment on resource-constrained edge hardware. Feedback and discussion are very welcome.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- TaxBreak: Unmasking the Hidden Costs of LLM Inference Through Overhead Decomposition (2026)
- PowerFlow-DNN: Compiler-Directed Fine-Grained Power Orchestration for End-to-End Edge AI Inference (2026)
- DualScale: Energy-Efficient Disaggregated LLM Serving via Phase-Aware Placement and DVFS (2026)
- MobileLLM-Flash: Latency-Guided On-Device LLM Design for Industry Scale Deployment (2026)
- WVA: A Global Optimization Control Plane for llmd (2026)
- LLMServingSim 2.0: A Unified Simulator for Heterogeneous and Disaggregated LLM Serving Infrastructure (2026)
- BOute: Cost-Efficient LLM Serving with Heterogeneous LLMs and GPUs via Multi-Objective Bayesian Optimization (2026)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend
Get this paper in your agent:
hf papers read 2602.06057 Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper