DeepGen 1.0: A Lightweight Unified Multimodal Model for Advancing Image Generation and Editing Paper • 2602.12205 • Published 3 days ago • 73
Zooming without Zooming: Region-to-Image Distillation for Fine-Grained Multimodal Perception Paper • 2602.11858 • Published 3 days ago • 39
Ming-V2 Collection Ming is the multi-modal series of any-to-any models developed by Ant Ling team. • 14 items • Updated 1 day ago • 33
Ming-Flash-Omni: A Sparse, Unified Architecture for Multimodal Perception and Generation Paper • 2510.24821 • Published Oct 28, 2025 • 40
Ming-Omni: A Unified Multimodal Model for Perception and Generation Paper • 2506.09344 • Published Jun 11, 2025 • 30
DINOv3 Collection DINOv3: foundation models producing excellent dense features, outperforming SotA w/o fine-tuning - https://arxiv.org/abs/2508.10104 • 13 items • Updated Aug 21, 2025 • 491
Tails Tell Tales: Chapter-Wide Manga Transcriptions with Character Names Paper • 2408.00298 • Published Aug 1, 2024 • 11
The Manga Whisperer: Automatically Generating Transcriptions for Comics Paper • 2401.10224 • Published Jan 18, 2024 • 3
PaddleOCR-VL-1.5: Towards a Multi-Task 0.9B VLM for Robust In-the-Wild Document Parsing Paper • 2601.21957 • Published 17 days ago • 19
LightOnOCR: A 1B End-to-End Multilingual Vision-Language Model for State-of-the-Art OCR Paper • 2601.14251 • Published 26 days ago • 24
PaddleOCR-VL Collection Boosting Multilingual Document Parsing via a 0.9B Ultra-Compact Vision-Language Model • 5 items • Updated 4 days ago • 28
PaddleOCR-VL-1.5 Collection Towards a Multi-Task 0.9B VLM for Robust In-the-Wild Document Parsing • 6 items • Updated 4 days ago • 9
LightOnOCR-2 🦉 Collection LightOnOCR-2-1B: a lightweight high-performance end-to-end OCR model family • 12 items • Updated 25 days ago • 22
Youtu-VL: Unleashing Visual Potential via Unified Vision-Language Supervision Paper • 2601.19798 • Published 19 days ago • 42