Visual ChatGPT: Talking, Drawing and Editing with Visual Foundation Models Paper • 2303.04671 • Published Mar 8, 2023
NUWA-XL: Diffusion over Diffusion for eXtremely Long Video Generation Paper • 2303.12346 • Published Mar 22, 2023 • 1
StrokeNUWA: Tokenizing Strokes for Vector Graphic Synthesis Paper • 2401.17093 • Published Jan 30, 2024 • 20
Using Left and Right Brains Together: Towards Vision and Language Planning Paper • 2402.10534 • Published Feb 16, 2024 • 1
EG4D: Explicit Generation of 4D Object without Score Distillation Paper • 2405.18132 • Published May 28, 2024
Step-Video-T2V Technical Report: The Practice, Challenges, and Future of Video Foundation Model Paper • 2502.10248 • Published Feb 14, 2025 • 57
Step-Video-TI2V Technical Report: A State-of-the-Art Text-Driven Image-to-Video Generation Model Paper • 2503.11251 • Published Mar 14, 2025 • 1
Qwen-Image-Layered: Towards Inherent Editability via Layer Decomposition Paper • 2512.15603 • Published Dec 17, 2025 • 69