Mutual Forcing: Dual-Mode Self-Evolution for Fast Autoregressive Audio-Video Character Generation Paper • 2604.25819 • Published 4 days ago • 16
SketchVLM: Vision language models can annotate images to explain thoughts and guide users Paper • 2604.22875 • Published 9 days ago • 31
dWorldEval: Scalable Robotic Policy Evaluation via Discrete Diffusion World Model Paper • 2604.22152 • Published 8 days ago • 4
Reward Hacking in the Era of Large Models: Mechanisms, Emergent Misalignment, Challenges Paper • 2604.13602 • Published 17 days ago • 31
Agentic World Modeling: Foundations, Capabilities, Laws, and Beyond Paper • 2604.22748 • Published 8 days ago • 217
EvoMaster: A Foundational Agent Framework for Building Evolving Autonomous Scientific Agents at Scale Paper • 2604.17406 • Published 13 days ago • 5
Tstars-Tryon 1.0: Robust and Realistic Virtual Try-On for Diverse Fashion Items Paper • 2604.19748 • Published 11 days ago • 248
MultiWorld: Scalable Multi-Agent Multi-View Video World Models Paper • 2604.18564 • Published 12 days ago • 43
view article Article Training and Finetuning Multimodal Embedding & Reranker Models with Sentence Transformers 16 days ago • 66
OneVL: One-Step Latent Reasoning and Planning with Vision-Language Explanation Paper • 2604.18486 • Published 12 days ago • 89
Agent-World: Scaling Real-World Environment Synthesis for Evolving General Agent Intelligence Paper • 2604.18292 • Published 12 days ago • 81
FP4 Explore, BF16 Train: Diffusion Reinforcement Learning via Efficient Rollout Scaling Paper • 2604.06916 • Published 24 days ago • 34
OpenWorldLib: A Unified Codebase and Definition of Advanced World Models Paper • 2604.04707 • Published 26 days ago • 203
Think, Act, Build: An Agentic Framework with Vision Language Models for Zero-Shot 3D Visual Grounding Paper • 2604.00528 • Published about 1 month ago • 12