GameWorld: Towards Standardized and Verifiable Evaluation of Multimodal Game Agents Paper • 2604.07429 • Published 6 days ago • 12
ViVa: A Video-Generative Value Model for Robot Reinforcement Learning Paper • 2604.08168 • Published 5 days ago • 15
Small Vision-Language Models are Smart Compressors for Long Video Understanding Paper • 2604.08120 • Published 5 days ago • 15
FIT: A Large-Scale Dataset for Fit-Aware Virtual Try-On Paper • 2604.08526 • Published 5 days ago • 19
MegaStyle: Constructing Diverse and Scalable Style Dataset via Consistent Text-to-Image Style Mapping Paper • 2604.08364 • Published 5 days ago • 92