Look Where It Matters: High-Resolution Crops Retrieval for Efficient VLMs Paper • 2603.16932 • Published 13 days ago • 81
HiMu: Hierarchical Multimodal Frame Selection for Long Video Question Answering Paper • 2603.18558 • Published 8 days ago • 10