Generate expressive voice from text using audio reference
VoxCPM
Generate depth maps from images
Generate an inverse depth map from an image
Segment and caption objects in images and videos