8 21 18

Xinchen Zhang

comin

https://cominclip.github.io/

Cominclip

AI & ML interests

None yet

Recent Activity

upvoted a paper 3 days ago

See Less, See Right: Bi-directional Perceptual Shaping For Multimodal Reasoning

upvoted a paper 22 days ago

Wan-Move: Motion-controllable Video Generation via Latent Trajectory Guidance

upvoted a paper 2 months ago

From Denoising to Refining: A Corrective Framework for Vision-Language Diffusion Model

View all activity

Organizations

None yet

upvoted a paper 3 days ago

See Less, See Right: Bi-directional Perceptual Shaping For Multimodal Reasoning

Paper • 2512.22120 • Published 6 days ago • 12

upvoted a paper 22 days ago

Wan-Move: Motion-controllable Video Generation via Latent Trajectory Guidance

Paper • 2512.08765 • Published 23 days ago • 128

upvoted a paper 2 months ago

From Denoising to Refining: A Corrective Framework for Vision-Language Diffusion Model

Paper • 2510.19871 • Published Oct 22, 2025 • 29

upvoted 2 papers 3 months ago

Generative Universal Verifier as Multimodal Meta-Reasoner

Paper • 2510.13804 • Published Oct 15, 2025 • 25

LongLive: Real-time Interactive Long Video Generation

Paper • 2509.22622 • Published Sep 26, 2025 • 184

upvoted a paper 4 months ago

Reconstruction Alignment Improves Unified Multimodal Models

Paper • 2509.07295 • Published Sep 8, 2025 • 40

upvoted 3 papers 6 months ago

AnyCap Project: A Unified Framework, Dataset, and Benchmark for Controllable Omni-modal Captioning

Paper • 2507.12841 • Published Jul 17, 2025 • 41

SpeakerVid-5M: A Large-Scale High-Quality Dataset for Audio-Visual Dyadic Interactive Human Generation

Paper • 2507.09862 • Published Jul 14, 2025 • 49

Scaling RL to Long Videos

Paper • 2507.07966 • Published Jul 10, 2025 • 159

upvoted a paper 7 months ago

MMaDA: Multimodal Large Diffusion Language Models

Paper • 2505.15809 • Published May 21, 2025 • 97

upvoted a paper 8 months ago

Seed1.5-VL Technical Report

Paper • 2505.07062 • Published May 11, 2025 • 154

upvoted 4 papers 11 months ago

Diffusion-Sharpening: Fine-tuning Diffusion Models with Denoising Trajectory Sharpening

Paper • 2502.12146 • Published Feb 17, 2025 • 16

HermesFlow: Seamlessly Closing the Gap in Multimodal Understanding and Generation

Paper • 2502.12148 • Published Feb 17, 2025 • 17

Improving Video Generation with Human Feedback

Paper • 2501.13918 • Published Jan 23, 2025 • 52

EMO2: End-Effector Guided Audio-Driven Avatar Video Generation

Paper • 2501.10687 • Published Jan 18, 2025 • 15

upvoted an article 12 months ago

Article

Explaining the SDXL latent space

May 20, 2024

•

upvoted 3 papers about 1 year ago

Infinity: Scaling Bitwise AutoRegressive Modeling for High-Resolution Image Synthesis

Paper • 2412.04431 • Published Dec 5, 2024 • 17

Movie Gen: A Cast of Media Foundation Models

Paper • 2410.13720 • Published Oct 17, 2024 • 99

IterComp: Iterative Composition-Aware Feedback Learning from Model Gallery for Text-to-Image Generation

Paper • 2410.07171 • Published Oct 9, 2024 • 43

upvoted a paper over 1 year ago

A Survey on the Honesty of Large Language Models

Paper • 2409.18786 • Published Sep 27, 2024 • 31

Xinchen Zhang

AI & ML interests

Recent Activity

Organizations

comin's activity

Explaining the SDXL latent space