MedITok: A Unified Tokenizer for Medical Image Synthesis and Interpretation
Paper β’ 2505.19225 β’ Published
π Paper β’ π€ Hugging Face β’ π§© Github
MedITok is the first unified visual tokenizer for medical images, introduced in Unified Medical Image Tokenizer for Autoregressive Synthesis and Understanding. Trained on 33M medical images and 2M image-caption pairs via a two-stage representation learning framework, MedITok:
This work is supported by Shanghai Innovation Institute (SII).
The following snippet demonstrates how to use the model for extracting features (requires the model implementation from the official repository):
import torch
import numpy as np
from PIL import Image
def read_image(img, img_size=256):
if isinstance(img, str):
img = Image.open(img)
if isinstance(img, Image.Image):
img = img.convert('RGB')
if img.size[0] != img_size:
img = img.resize((img_size, img_size), Image.LANCZOS)
return img
def image_to_tensor(x):
# [H, W, C] -> [B, C, H, W]
x = torch.FloatTensor(np.array(x)).permute(2, 0, 1)
x = (x / 255.) * 2. - 1.
return x.unsqueeze(0)
# Assuming 'net' is the loaded MedITok model
img_path = 'assets/vis_imgs/sample1.png'
img = read_image(img_path)
x = image_to_tensor(img)
with torch.no_grad():
f = net.forward_features(x)
@article{ma2025meditok,
title={MedITok: A Unified Tokenizer for Medical Image Synthesis and Interpretation},
author={Ma, Chenglong and Ji, Yuanfeng and Ye, Jin and Li, Zilong and Wang, Chenhui and Ning, Junzhi and Li, Wei and Liu, Lihao and Guo, Qiushan and Li, Tianbin and He, Junjun and Shan, Hongming},
journal={arXiv preprint arXiv:2505.19225},
year={2025}
}