Generate captions for music audio
Generate edited video frames using text prompts
Generate images from sketches, edges, poses, and depth maps
Engage in multimedia chat with LLMs and ML models