BlueberryOreo
/

ProCap

change captioning

vision-language

procedural reasoning

Model card Files Files and versions

Link paper and GitHub repository

#1

by nielsr HF Staff - opened 5 days ago

base: refs/heads/main

←

from: refs/pr/1

Discussion Files changed

Files changed (1) hide show

README.md +14 -8

README.md CHANGED Viewed

@@ -1,21 +1,21 @@
 ---
-license: mit
-tags:
-- change captioning
-- vision-language
-- image-to-text
-- procedural reasoning
-- multimodal
-- pytorch
 datasets:
 - clevr-change
 - image-editing-request
 - spot-the-diff
 metrics:
 - bleu
 - meteor
 - rouge
 pipeline_tag: image-to-text
 ---
 # ProCap: Experiment Materials
@@ -24,6 +24,12 @@ This repository contains the **official experimental materials** for the paper:
 > **Imagine How to Change: Explicit Procedure Modeling for Change Captioning**
 It provides **processed datasets**, **pre-trained model weights**, and **evaluation tools** for reproducing the results reported in the paper.
 📦 All materials are also available via [Baidu Netdisk](https://pan.baidu.com/s/1t_YXB6J_vkuPxByn2hat2A)

 ---
 datasets:
 - clevr-change
 - image-editing-request
 - spot-the-diff
+license: mit
 metrics:
 - bleu
 - meteor
 - rouge
 pipeline_tag: image-to-text
+tags:
+- change captioning
+- vision-language
+- image-to-text
+- procedural reasoning
+- multimodal
+- pytorch
 ---
 # ProCap: Experiment Materials
 > **Imagine How to Change: Explicit Procedure Modeling for Change Captioning**
+[[Paper](https://huggingface.co/papers/2603.05969)] [[Code](https://github.com/BlueberryOreo/ProCap)]
+ProCap is a framework that reformulates change modeling from static image comparison to dynamic procedure modeling. It features a two-stage design:
+1. **Explicit Procedure Modeling**: Trains a procedure encoder to learn the change procedure from a sparse set of keyframes.
+2. **Implicit Procedure Captioning**: Integrates the trained encoder within an encoder-decoder model for captioning using learnable procedure queries.
 It provides **processed datasets**, **pre-trained model weights**, and **evaluation tools** for reproducing the results reported in the paper.
 📦 All materials are also available via [Baidu Netdisk](https://pan.baidu.com/s/1t_YXB6J_vkuPxByn2hat2A)