Siamese Vision Transformers are Scalable Audio-visual Learners Paper • 2403.19638 • Published Mar 28, 2024
V2M-Zero: Zero-Pair Time-Aligned Video-to-Music Generation Paper • 2603.11042 • Published 1 day ago • 2
V2M-Zero: Zero-Pair Time-Aligned Video-to-Music Generation Paper • 2603.11042 • Published 1 day ago • 2
SiLVR: A Simple Language-based Video Reasoning Framework Paper • 2505.24869 • Published May 30, 2025 • 5
VMAS: Video-to-Music Generation via Semantic Alignment in Web Music Videos Paper • 2409.07450 • Published Sep 11, 2024 • 11