stable diffusion split out: + remove noise from existing images. train model for this + make random images from nise + condition model on text + classifier free guidance
multimodal: text to image (stable diffusion)
multimodal: text to video??
diffusion networks (can be in NN or does it require domain specific knowledge?)
"3D reconstruction from multiple images" h3 on images (supervised) + cnn/pooling + capsules