Ilya Pereverzin's picture

Ilya Pereverzin

NodeLinker

·

PlyMxt

AI & ML interests

Isn't it amazing that we let a computer think like a human?

Recent Activity

upvoted a paper about 15 hours ago

DeepGen 1.0: A Lightweight Unified Multimodal Model for Advancing Image Generation and Editing

upvoted a paper about 15 hours ago

Zooming without Zooming: Region-to-Image Distillation for Fine-Grained Multimodal Perception

upvoted a collection about 15 hours ago

Zooming-without-Zooming

View all activity

Organizations

upvoted 2 papers about 15 hours ago

DeepGen 1.0: A Lightweight Unified Multimodal Model for Advancing Image Generation and Editing

Paper • 2602.12205 • Published 3 days ago • 73

Zooming without Zooming: Region-to-Image Distillation for Fine-Grained Multimodal Perception

Paper • 2602.11858 • Published 3 days ago • 39

upvoted a collection about 15 hours ago

Zooming-without-Zooming

6 items • Updated 1 day ago • 4

upvoted a collection 4 days ago

Ming-V2

Ming is the multi-modal series of any-to-any models developed by Ant Ling team. • 14 items • Updated 1 day ago • 33

upvoted 2 papers 4 days ago

Ming-Flash-Omni: A Sparse, Unified Architecture for Multimodal Perception and Generation

Paper • 2510.24821 • Published Oct 28, 2025 • 40

Ming-Omni: A Unified Multimodal Model for Perception and Generation

Paper • 2506.09344 • Published Jun 11, 2025 • 30

upvoted a collection 5 days ago

DINOv3

DINOv3: foundation models producing excellent dense features, outperforming SotA w/o fine-tuning - https://arxiv.org/abs/2508.10104 • 13 items • Updated Aug 21, 2025 • 491

upvoted 3 papers 5 days ago

DINOv3

Paper • 2508.10104 • Published Aug 13, 2025 • 297

Tails Tell Tales: Chapter-Wide Manga Transcriptions with Character Names

Paper • 2408.00298 • Published Aug 1, 2024 • 11

The Manga Whisperer: Automatically Generating Transcriptions for Comics

Paper • 2401.10224 • Published Jan 18, 2024 • 3

upvoted a paper 6 days ago

PaddleOCR-VL-1.5: Towards a Multi-Task 0.9B VLM for Robust In-the-Wild Document Parsing

Paper • 2601.21957 • Published 17 days ago • 19

upvoted a paper 7 days ago

Qwen3 Technical Report

Paper • 2505.09388 • Published May 14, 2025 • 332

upvoted a paper 8 days ago

Qwen-Image Technical Report

Paper • 2508.02324 • Published Aug 4, 2025 • 272

upvoted a collection 8 days ago

Qwen3-abliterated

32 items • Updated Dec 22, 2025 • 48

upvoted a paper 9 days ago

LightOnOCR: A 1B End-to-End Multilingual Vision-Language Model for State-of-the-Art OCR

Paper • 2601.14251 • Published 26 days ago • 24

upvoted 4 collections 9 days ago

PaddleOCR-VL

Boosting Multilingual Document Parsing via a 0.9B Ultra-Compact Vision-Language Model • 5 items • Updated 4 days ago • 28

PaddleOCR-VL-1.5

Towards a Multi-Task 0.9B VLM for Robust In-the-Wild Document Parsing • 6 items • Updated 4 days ago • 9

LightOnOCR-2 🦉

LightOnOCR-2-1B: a lightweight high-performance end-to-end OCR model family • 12 items • Updated 25 days ago • 22

Step-3.5-Flash

step 3.5 models • 6 items • Updated 3 days ago • 30

upvoted a paper 11 days ago

Youtu-VL: Unleashing Visual Potential via Unified Vision-Language Supervision

Paper • 2601.19798 • Published 19 days ago • 42