SlimSpec: Low-Rank Draft LM-Head for Accelerated Speculative Decoding
Paper • 2605.10453 • Published • 9
AI-centric cloud platform ready for intensive workloads Training-ready platform with NVIDIA® H100 Tensor Core GPUs. Competitive pricing. Dedicated support.
SlimSpec: Low-Rank Draft LM-Head for Accelerated Speculative Decoding
LK Losses: Direct Acceptance Rate Optimization for Speculative Decoding