CurMIM: Curriculum Masked Image Modeling
Hao Liu1 Kun Wang1 Yudong Han1 Haocong Wang1 Yupeng Hu1 Chunxiao Wang2 Liqiang Nie3
1School of Software, Shandong University, Jinan, China
2Key Laboratory of Computing Power Network and Information Security, Ministry of Education, Qilu University of Technology (Shandong Academy of Sciences), Jinan, China
3School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, China
This is the official PyTorch implementation of CurMIM, a curriculum-based masked image modeling framework for self-supervised visual representation learning.
๐ Paper: CurMIM: Curriculum Masked Image Modeling ๐ GitHub Repository: iLearn-Lab/ICASSP25-CurMIM
Model Information
1. Model Name
CurMIM (Curriculum Masked Image Modeling).
2. Task Type & Applicable Tasks
- Task Type: Masked Image Modeling (MIM) / Self-Supervised Visual Representation Learning / Vision Transformer Pretraining
- Applicable Tasks: Curriculum-based masked image pretraining, visual representation learning, finetuning, and linear probing for image classification.
3. Project Introduction
Masked Image Modeling (MIM) usually adopts a fixed masking strategy during pretraining. CurMIM introduces a curriculum-style masking strategy that progressively adjusts masking behavior, enabling the model to learn from easier to harder reconstruction targets and thereby improving representation quality.
The repository provides a complete workflow for pretraining, finetuning, and linear probing, together with utilities for distributed training and experiment management.
4. Training Data Source
The model follows the dataset preparation protocol of MAE and is mainly designed for:
- ImageNet
- miniImageNet
Usage & Basic Inference
This codebase provides scripts for curriculum-based MIM pretraining, finetuning, and linear probing.
Step 1: Prepare the Environment
Clone the GitHub repository and install dependencies:
git clone https://github.com/iLearn-Lab/ICASSP25-CurMIM.git
cd CurMIM
python -m venv .venv
source .venv/bin/activate # Linux / Mac
# .venv\Scripts\activate # Windows
pip install torch torchvision timm==0.3.2 tensorboard
Step 2: Download Model Weights & Data
Follow MAE's dataset preparation for ImageNet.
Step 3: Run Testing / Inference
To pretrain the model, run:
python -m torch.distributed.launch --nproc_per_node {GPU_number} ./main_pretrain.py --batch_size 128 \
--accum_iter 2 \
--model {model_type} \
--mask_ratio 0.75 --epochs 300 --warmup_epochs 40 \
--blr 4e-4 --weight_decay 0.05 \
--data_path ../path --output_dir ./output_dir/
To finetune the model, run:
python -m torch.distributed.launch --nproc_per_node={GPU_number} ./main_finetune.py \
--batch_size 128 \
--nb_classes {nb_classes} \
--model {model_type} \
--finetune ./checkpoint.pth \
--epochs 100 \
--blr 1e-3 --layer_decay 0.65 --output_dir ./finetune \
--weight_decay 0.05 --drop_path 0.1 --mixup 0.8 --cutmix 1.0 --reprob 0.25 \
--dist_eval --data_path ../data/
Limitations & Notes
Disclaimer: This repository is intended for academic research purposes only.
- The model requires access to the original datasets for pretraining and downstream evaluation.
- Training performance may vary depending on model size, masking ratio, and distributed training configuration.
- Users should prepare the dataset following the MAE protocol before reproduction.
Citation
If you find our work useful in your research, please consider citing our paper:
@inproceedings{liu2025curmim,
title={CurMIM: Curriculum Masked Image Modeling},
author={Liu, Hao and Wang, Kun and Han, Yudong and Wang, Haocong and Hu, Yupeng and Wang, Chunxiao and Nie, Liqiang},
booktitle={2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
pages={1--5},
year={2025},
doi={10.1109/ICASSP49660.2025.10890877}
}
Contact
If you have any questions, feel free to contact me at liuh90210@gmail.com.