Ming Chen's picture

Ming Chen

ChenMing-thu14

·

AI & ML interests

3D Human Pose Estimation

Recent Activity

upvoted a paper 4 days ago

LocateAnything: Fast and High-Quality Vision-Language Grounding with Parallel Box Decoding

upvoted a paper 6 days ago

WBench: A Comprehensive Multi-turn Benchmark for Interactive Video World Model Evaluation

upvoted a paper 10 days ago

Bernini: Latent Semantic Planning for Video Diffusion

View all activity

Organizations

None yet

upvoted a paper 4 days ago

LocateAnything: Fast and High-Quality Vision-Language Grounding with Parallel Box Decoding

Paper • 2605.27365 • Published 6 days ago • 128

upvoted a paper 6 days ago

WBench: A Comprehensive Multi-turn Benchmark for Interactive Video World Model Evaluation

Paper • 2605.25874 • Published 7 days ago • 100

upvoted a paper 10 days ago

Bernini: Latent Semantic Planning for Video Diffusion

Paper • 2605.22344 • Published 11 days ago • 12

upvoted 2 papers 12 days ago

Lance: Unified Multimodal Modeling by Multi-Task Synergy

Paper • 2605.18678 • Published 14 days ago • 76

LongLive-2.0: An NVFP4 Parallel Infrastructure for Long Video Generation

Paper • 2605.18739 • Published 14 days ago • 111

upvoted 4 papers about 1 month ago

WorldMark: A Unified Benchmark Suite for Interactive Video World Models

Paper • 2604.21686 • Published Apr 23 • 36

CoInteract: Physically-Consistent Human-Object Interaction Video Synthesis via Spatially-Structured Co-Generation

Paper • 2604.19636 • Published Apr 21 • 87

HDR Video Generation via Latent Alignment with Logarithmic Encoding

Paper • 2604.11788 • Published Apr 13 • 10

HY-World 2.0: A Multi-Modal World Model for Reconstructing, Generating, and Simulating 3D Worlds

Paper • 2604.14268 • Published Apr 15 • 122

upvoted 4 papers about 2 months ago

Seedance 2.0: Advancing Video Generation for World Complexity

Paper • 2604.14148 • Published Apr 15 • 163

LPM 1.0: Video-based Character Performance Model

Paper • 2604.07823 • Published Apr 9 • 80

OmniShow: Unifying Multimodal Conditions for Human-Object Interaction Video Generation

Paper • 2604.11804 • Published Apr 13 • 72

Matrix-Game 3.0: Real-Time and Streaming Interactive World Model with Long-Horizon Memory

Paper • 2604.08995 • Published Apr 10 • 51

upvoted 7 papers 2 months ago

LongCat-Next: Lexicalizing Modalities as Discrete Tokens

Paper • 2603.27538 • Published Mar 29 • 147

PackForcing: Short Video Training Suffices for Long Video Sampling and Long Context Inference

Paper • 2603.25730 • Published Mar 26 • 53

ShotStream: Streaming Multi-Shot Video Generation for Interactive Storytelling

Paper • 2603.25746 • Published Mar 26 • 155

Speed by Simplicity: A Single-Stream Architecture for Fast Audio-Video Generative Foundation Model

Paper • 2603.21986 • Published Mar 23 • 125

Versatile Editing of Video Content, Actions, and Dynamics without Training

Paper • 2603.17989 • Published Mar 18 • 18

SAMA: Factorized Semantic Anchoring and Motion Alignment for Instruction-Guided Video Editing

Paper • 2603.19228 • Published Mar 19 • 68

WorldCam: Interactive Autoregressive 3D Gaming Worlds with Camera Pose as a Unifying Geometric Representation

Paper • 2603.16871 • Published Mar 17 • 61