Making Dialogue Grounding Data Rich: A Three-Tier Data Synthesis Framework for Generalized Referring Expression Comprehension Paper • 2512.02791 • Published Dec 2, 2025 • 1
QTSplus Collection Official models and datasets for paper(https://arxiv.org/abs/2511.11910) • 7 items • Updated Dec 2, 2025 • 1
Seeing the Forest and the Trees: Query-Aware Tokenizer for Long-Video Multimodal Language Models Paper • 2511.11910 • Published Nov 14, 2025 • 35
MMAR: A Challenging Benchmark for Deep Reasoning in Speech, Audio, Music, and Their Mix Paper • 2505.13032 • Published May 19, 2025 • 2
μ^2Tokenizer: Differentiable Multi-Scale Multi-Modal Tokenizer for Radiology Report Generation Paper • 2507.00316 • Published Jun 30, 2025 • 15
CMI-Bench: A Comprehensive Benchmark for Evaluating Music Instruction Following Paper • 2506.12285 • Published Jun 14, 2025 • 54
3D Gaussian Splatting for Real-Time Radiance Field Rendering Paper • 2308.04079 • Published Aug 8, 2023 • 194