Training Long-Context Vision-Language Models Effectively with Generalization Beyond 128K Context Paper • 2605.13831 • Published 3 days ago • 81
Diversity-Preserved Distribution Matching Distillation for Fast Visual Synthesis Paper • 2602.03139 • Published Feb 3 • 44
Making Avatars Interact: Towards Text-Driven Human-Object Interaction for Controllable Talking Avatars Paper • 2602.01538 • Published Feb 2 • 15
Making Avatars Interact: Towards Text-Driven Human-Object Interaction for Controllable Talking Avatars Paper • 2602.01538 • Published Feb 2 • 15
Visual Generation Unlocks Human-Like Reasoning through Multimodal World Models Paper • 2601.19834 • Published Jan 27 • 25
X-Coder: Advancing Competitive Programming with Fully Synthetic Tasks, Solutions, and Tests Paper • 2601.06953 • Published Jan 11 • 46
See Less, See Right: Bi-directional Perceptual Shaping For Multimodal Reasoning Paper • 2512.22120 • Published Dec 26, 2025 • 15
Wan-Move: Motion-controllable Video Generation via Latent Trajectory Guidance Paper • 2512.08765 • Published Dec 9, 2025 • 134
From Denoising to Refining: A Corrective Framework for Vision-Language Diffusion Model Paper • 2510.19871 • Published Oct 22, 2025 • 30
Generative Universal Verifier as Multimodal Meta-Reasoner Paper • 2510.13804 • Published Oct 15, 2025 • 28
Generative Universal Verifier as Multimodal Meta-Reasoner Paper • 2510.13804 • Published Oct 15, 2025 • 28