view article Article TimeScope: How Long Can Your Video Large Multimodal Model Go? +2 Jul 23, 2025 • 46
view article Article SmolVLA: Efficient Vision-Language-Action Model trained on Lerobot Community Data +7 Jun 3, 2025 • 305
view article Article nanoVLM: The simplest repository to train your VLM in pure PyTorch +5 May 21, 2025 • 247
view article Article SmolVLM Grows Smaller – Introducing the 256M & 500M Models! +1 Jan 23, 2025 • 189
view article Article LAVE: Zero-shot VQA Evaluation on Docmatix with LLMs - Do We Still Need Fine-Tuning? Jul 25, 2024 • 17
view article Article LAVE: Zero-shot VQA Evaluation on Docmatix with LLMs - Do We Still Need Fine-Tuning? Jul 25, 2024 • 17
view article Article Docmatix - a huge dataset for Document Visual Question Answering Jul 18, 2024 • 77
view article Article Fine-tuning Florence-2 - Microsoft's Cutting-edge Vision Language Models +1 Jun 24, 2024 • 205