MedITok: A Unified Tokenizer for Medical Image Synthesis and Interpretation

🚀 Introduction

MedITok is the first unified visual tokenizer for medical images, introduced in Unified Medical Image Tokenizer for Autoregressive Synthesis and Understanding. Trained on 33M medical images and 2M image-caption pairs via a two-stage representation learning framework, MedITok:

effectively encodes visual details and clinical semantics into a unified token space
achieves state-of-the-art performance across diverse medical imaging modalities and tasks.
can be incorporated into prevelant generative models (e.g., autoregressive architectures) for downstream medical image synthesis and interpretation.

This work is supported by Shanghai Innovation Institute (SII).

🎯 Sample Usage

Image feature extraction

The following snippet demonstrates how to use the model for extracting features (requires the model implementation from the official repository):

import torch
import numpy as np
from PIL import Image

def read_image(img, img_size=256):
    if isinstance(img, str):
        img = Image.open(img)
        
    if isinstance(img, Image.Image):
        img = img.convert('RGB')
        if img.size[0] != img_size:
            img = img.resize((img_size, img_size), Image.LANCZOS)
    return img

def image_to_tensor(x):
    # [H, W, C] -> [B, C, H, W]
    x = torch.FloatTensor(np.array(x)).permute(2, 0, 1)
    x = (x / 255.) * 2. - 1.
    return x.unsqueeze(0)

# Assuming 'net' is the loaded MedITok model
img_path = 'assets/vis_imgs/sample1.png'
img = read_image(img_path)
x = image_to_tensor(img)
with torch.no_grad():
    f = net.forward_features(x)

✏️ Citation

@article{ma2025meditok,
  title={MedITok: A Unified Tokenizer for Medical Image Synthesis and Interpretation},
  author={Ma, Chenglong and Ji, Yuanfeng and Ye, Jin and Li, Zilong and Wang, Chenhui and Ning, Junzhi and Li, Wei and Liu, Lihao and Guo, Qiushan and Li, Tianbin and He, Junjun and Shan, Hongming},
  journal={arXiv preprint arXiv:2505.19225},
  year={2025}
}

Downloads last month: 27

Paper for massaki75/meditok

MedITok: A Unified Tokenizer for Medical Image Synthesis and Interpretation

Paper • 2505.19225 • Published May 25, 2025