multimodal - a CelesteChen Collection

CelesteChen 's Collections

audio-visual foundation model

creative-writing

multimodal

updated Nov 14, 2025

DLER: Doing Length pEnalty Right - Incentivizing More Intelligence per Token via Reinforcement Learning

Paper • 2510.15110 • Published Oct 16, 2025 • 17
PaddleOCR-VL: Boosting Multilingual Document Parsing via a 0.9B Ultra-Compact Vision-Language Model

Paper • 2510.14528 • Published Oct 16, 2025 • 118
Bee: A High-Quality Corpus and Full-Stack Suite to Unlock Advanced Fully Open MLLMs

Paper • 2510.13795 • Published Oct 15, 2025 • 59
UniME-V2: MLLM-as-a-Judge for Universal Multimodal Embedding Learning

Paper • 2510.13515 • Published Oct 15, 2025 • 12
SAIL-Embedding Technical Report: Omni-modal Embedding Foundation Model

Paper • 2510.12709 • Published Oct 14, 2025 • 13
HoneyBee: Data Recipes for Vision-Language Reasoners

Paper • 2510.12225 • Published Oct 14, 2025 • 11
Visual Spatial Tuning

Paper • 2511.05491 • Published Nov 7, 2025 • 52
DeepEyesV2: Toward Agentic Multimodal Model

Paper • 2511.05271 • Published Nov 7, 2025 • 45
NVIDIA Nemotron Nano V2 VL

Paper • 2511.03929 • Published Nov 6, 2025 • 30
SAIL-RL: Guiding MLLMs in When and How to Think via Dual-Reward RL Tuning

Paper • 2511.02280 • Published Nov 4, 2025 • 4
When Visualizing is the First Step to Reasoning: MIRA, a Benchmark for Visual Chain-of-Thought

Paper • 2511.02779 • Published Nov 4, 2025 • 59
Sample By Step, Optimize By Chunk: Chunk-Level GRPO For Text-to-Image Generation

Paper • 2510.21583 • Published Oct 24, 2025 • 31
Towards Universal Video Retrieval: Generalizing Video Embedding via Synthesized Multimodal Pyramid Curriculum

Paper • 2510.27571 • Published Oct 31, 2025 • 19