TUNA: Taming Unified Visual Representations for Native Unified Multimodal Models Paper • 2512.02014 • Published Dec 1, 2025 • 71
Any2Caption:Interpreting Any Condition to Caption for Controllable Video Generation Paper • 2503.24379 • Published Mar 31, 2025 • 76
Synthetic Video Enhances Physical Fidelity in Video Synthesis Paper • 2503.20822 • Published Mar 26, 2025 • 16
TransMLA: Multi-head Latent Attention Is All You Need Paper • 2502.07864 • Published Feb 11, 2025 • 57
An Image is Worth 32 Tokens for Reconstruction and Generation Paper • 2406.07550 • Published Jun 11, 2024 • 60
Better & Faster Large Language Models via Multi-token Prediction Paper • 2404.19737 • Published Apr 30, 2024 • 81