CelesteChen 's Collections multimodal
updated
DLER: Doing Length pEnalty Right - Incentivizing More Intelligence per
Token via Reinforcement Learning
Paper
• 2510.15110
• Published
• 17
PaddleOCR-VL: Boosting Multilingual Document Parsing via a 0.9B Ultra-Compact Vision-Language Model
Paper
• 2510.14528
• Published
• 118
Bee: A High-Quality Corpus and Full-Stack Suite to Unlock Advanced Fully
Open MLLMs
Paper
• 2510.13795
• Published
• 59
UniME-V2: MLLM-as-a-Judge for Universal Multimodal Embedding Learning
Paper
• 2510.13515
• Published
• 12
SAIL-Embedding Technical Report: Omni-modal Embedding Foundation Model
Paper
• 2510.12709
• Published
• 13
HoneyBee: Data Recipes for Vision-Language Reasoners
Paper
• 2510.12225
• Published
• 11
Paper
• 2511.05491
• Published
• 52
DeepEyesV2: Toward Agentic Multimodal Model
Paper
• 2511.05271
• Published
• 45
NVIDIA Nemotron Nano V2 VL
Paper
• 2511.03929
• Published
• 30
SAIL-RL: Guiding MLLMs in When and How to Think via Dual-Reward RL Tuning
Paper
• 2511.02280
• Published
• 4
When Visualizing is the First Step to Reasoning: MIRA, a Benchmark for
Visual Chain-of-Thought
Paper
• 2511.02779
• Published
• 59
Sample By Step, Optimize By Chunk: Chunk-Level GRPO For Text-to-Image
Generation
Paper
• 2510.21583
• Published
• 31
Towards Universal Video Retrieval: Generalizing Video Embedding via
Synthesized Multimodal Pyramid Curriculum
Paper
• 2510.27571
• Published
• 19