CurMIM: Curriculum Masked Image Modeling

Hao Liu¹ Kun Wang¹ Yudong Han¹ Haocong Wang¹ Yupeng Hu¹ Chunxiao Wang² Liqiang Nie³

¹School of Software, Shandong University, Jinan, China
²Key Laboratory of Computing Power Network and Information Security, Ministry of Education, Qilu University of Technology (Shandong Academy of Sciences), Jinan, China
³School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, China

This is the official PyTorch implementation of CurMIM, a curriculum-based masked image modeling framework for self-supervised visual representation learning.

🔗 Paper: CurMIM: Curriculum Masked Image Modeling 🔗 GitHub Repository: iLearn-Lab/ICASSP25-CurMIM

Model Information

1. Model Name

CurMIM (Curriculum Masked Image Modeling).

2. Task Type & Applicable Tasks

Task Type: Masked Image Modeling (MIM) / Self-Supervised Visual Representation Learning / Vision Transformer Pretraining
Applicable Tasks: Curriculum-based masked image pretraining, visual representation learning, finetuning, and linear probing for image classification.

3. Project Introduction

Masked Image Modeling (MIM) usually adopts a fixed masking strategy during pretraining. CurMIM introduces a curriculum-style masking strategy that progressively adjusts masking behavior, enabling the model to learn from easier to harder reconstruction targets and thereby improving representation quality.

The repository provides a complete workflow for pretraining, finetuning, and linear probing, together with utilities for distributed training and experiment management.

4. Training Data Source

The model follows the dataset preparation protocol of MAE and is mainly designed for:

ImageNet
miniImageNet

Usage & Basic Inference

This codebase provides scripts for curriculum-based MIM pretraining, finetuning, and linear probing.

Step 1: Prepare the Environment

Clone the GitHub repository and install dependencies:

git clone https://github.com/iLearn-Lab/ICASSP25-CurMIM.git
cd CurMIM
python -m venv .venv
source .venv/bin/activate   # Linux / Mac
# .venv\Scripts\activate    # Windows
pip install torch torchvision timm==0.3.2 tensorboard

Step 2: Download Model Weights & Data

Follow MAE's dataset preparation for ImageNet.

Step 3: Run Testing / Inference

To pretrain the model, run:

python -m torch.distributed.launch --nproc_per_node {GPU_number} ./main_pretrain.py --batch_size 128 \
--accum_iter 2 \
--model {model_type} \
--mask_ratio 0.75 --epochs 300 --warmup_epochs 40 \
--blr 4e-4 --weight_decay 0.05 \
--data_path ../path --output_dir ./output_dir/

To finetune the model, run:

python -m torch.distributed.launch --nproc_per_node={GPU_number} ./main_finetune.py \
    --batch_size 128 \
    --nb_classes {nb_classes} \
    --model {model_type} \
    --finetune ./checkpoint.pth \
    --epochs 100 \
    --blr 1e-3 --layer_decay 0.65 --output_dir ./finetune \
    --weight_decay 0.05 --drop_path 0.1 --mixup 0.8 --cutmix 1.0 --reprob 0.25 \
    --dist_eval --data_path ../data/

Limitations & Notes

Disclaimer: This repository is intended for academic research purposes only.

The model requires access to the original datasets for pretraining and downstream evaluation.
Training performance may vary depending on model size, masking ratio, and distributed training configuration.
Users should prepare the dataset following the MAE protocol before reproduction.

Citation

If you find our work useful in your research, please consider citing our paper:

@inproceedings{liu2025curmim,
  title={CurMIM: Curriculum Masked Image Modeling},
  author={Liu, Hao and Wang, Kun and Han, Yudong and Wang, Haocong and Hu, Yupeng and Wang, Chunxiao and Nie, Liqiang},
  booktitle={2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
  pages={1--5},
  year={2025},
  doi={10.1109/ICASSP49660.2025.10890877}
}

Contact

If you have any questions, feel free to contact me at liuh90210@gmail.com.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support