VFIG: Vectorizing Complex Figures in SVG with Vision-Language Models
Abstract
VFIG is a vision-language model family for converting raster images to scalable vector graphics using a large dataset and hierarchical training approach, achieving performance comparable to proprietary models.
Scalable Vector Graphics (SVG) are an essential format for technical illustration and digital design, offering precise resolution independence and flexible semantic editability. In practice, however, original vector source files are frequently lost or inaccessible, leaving only "flat" rasterized versions (e.g., PNG or JPEG) that are difficult to modify or scale. Manually reconstructing these figures is a prohibitively labor-intensive process, requiring specialized expertise to recover the original geometric intent. To bridge this gap, we propose VFIG, a family of Vision-Language Models trained for complex and high-fidelity figure-to-SVG conversion. While this task is inherently data-driven, existing datasets are typically small-scale and lack the complexity of professional diagrams. We address this by introducing VFIG-DATA, a large-scale dataset of 66K high-quality figure-SVG pairs, curated from a diverse mix of real-world paper figures and procedurally generated diagrams. Recognizing that SVGs are composed of recurring primitives and hierarchical local structures, we introduce a coarse-to-fine training curriculum that begins with supervised fine-tuning (SFT) to learn atomic primitives and transitions to reinforcement learning (RL) refinement to optimize global diagram fidelity, layout consistency, and topological edge cases. Finally, we introduce VFIG-BENCH, a comprehensive evaluation suite with novel metrics designed to measure the structural integrity of complex figures. VFIG achieves state-of-the-art performance among open-source models and performs on par with GPT-5.2, achieving a VLM-Judge score of 0.829 on VFIG-BENCH.
Community
Ever come across a beautiful Figure 1 in a paper, only to wish you could easily edit and adapt it for your own use?
Check out our new work VFig: Vectorizing Complex Figures in SVG with Vision-Language Models! It is a specialized VLM that converts any diagram – from simple to complex – into clean, editable SVG code.
Built on Qwen3-VL 4B with SFT & RL, it matches GPT5.2’s performance on converting complex diagrams into SVG code, judged by GPT & Gemini. And it outperforms open-source generalists and specialists in vectorizing diagrams from simple to complex.
Website: https://vfig-proj.github.io/
Demo: https://huggingface.co/spaces/allenai/VFig-Image2SVG-Demo
Paper: https://arxiv.org/pdf/2603.24575
Model: https://huggingface.co/collections/QijiaHe/vfig
Code: https://github.com/RAIVNLab/VFig
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- WildSVG: Towards Reliable SVG Generation Under Real-Word Conditions (2026)
- IntroSVG: Learning from Rendering Feedback for Text-to-SVG Generation via an Introspective Generator-Critic Framework (2026)
- TikZilla: Scaling Text-to-TikZ with High-Quality Data and Reinforcement Learning (2026)
- OCRVerse: Towards Holistic OCR in End-to-End Vision-Language Models (2026)
- Multimodal OCR: Parse Anything from Documents (2026)
- From Tokens to Numbers: Continuous Number Modeling for SVG Generation (2026)
- FireRed-OCR Technical Report (2026)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend
Get this paper in your agent:
hf papers read 2603.24575 Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash Models citing this paper 1
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 1
Collections including this paper 0
No Collection including this paper