view article Article SmolVLA: Efficient Vision-Language-Action Model trained on Lerobot Community Data +7 Jun 3, 2025 • 306
SmolVLM: Redefining small and efficient multimodal models Paper • 2504.05299 • Published Apr 7, 2025 • 203
AI for Disability Collection A collection of datasets, models, spaces and papers that uses AI to address a disability-related topic. • 4 items • Updated Jun 10, 2025 • 3
SmolLM2 Collection State-of-the-art compact LLMs for on-device applications: 1.7B, 360M, 135M • 16 items • Updated May 5, 2025 • 298
Multimodal Chaptering for Long-Form TV Newscast Video Paper • 2406.17590 • Published Mar 20, 2024 • 2
Building and better understanding vision-language models: insights and future directions Paper • 2408.12637 • Published Aug 22, 2024 • 133
Goldfish: Vision-Language Understanding of Arbitrarily Long Videos Paper • 2407.12679 • Published Jul 17, 2024 • 8
Towards Retrieval Augmented Generation over Large Video Libraries Paper • 2406.14938 • Published Jun 21, 2024 • 22
Inserting Faces inside Captions: Image Captioning with Attention Guided Merging Paper • 2405.02305 • Published Mar 20, 2024 • 2