Link paper and GitHub repository

#1
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +14 -8
README.md CHANGED
@@ -1,21 +1,21 @@
1
  ---
2
- license: mit
3
- tags:
4
- - change captioning
5
- - vision-language
6
- - image-to-text
7
- - procedural reasoning
8
- - multimodal
9
- - pytorch
10
  datasets:
11
  - clevr-change
12
  - image-editing-request
13
  - spot-the-diff
 
14
  metrics:
15
  - bleu
16
  - meteor
17
  - rouge
18
  pipeline_tag: image-to-text
 
 
 
 
 
 
 
19
  ---
20
 
21
  # ProCap: Experiment Materials
@@ -24,6 +24,12 @@ This repository contains the **official experimental materials** for the paper:
24
 
25
  > **Imagine How to Change: Explicit Procedure Modeling for Change Captioning**
26
 
 
 
 
 
 
 
27
  It provides **processed datasets**, **pre-trained model weights**, and **evaluation tools** for reproducing the results reported in the paper.
28
 
29
  📦 All materials are also available via [Baidu Netdisk](https://pan.baidu.com/s/1t_YXB6J_vkuPxByn2hat2A)
 
1
  ---
 
 
 
 
 
 
 
 
2
  datasets:
3
  - clevr-change
4
  - image-editing-request
5
  - spot-the-diff
6
+ license: mit
7
  metrics:
8
  - bleu
9
  - meteor
10
  - rouge
11
  pipeline_tag: image-to-text
12
+ tags:
13
+ - change captioning
14
+ - vision-language
15
+ - image-to-text
16
+ - procedural reasoning
17
+ - multimodal
18
+ - pytorch
19
  ---
20
 
21
  # ProCap: Experiment Materials
 
24
 
25
  > **Imagine How to Change: Explicit Procedure Modeling for Change Captioning**
26
 
27
+ [[Paper](https://huggingface.co/papers/2603.05969)] [[Code](https://github.com/BlueberryOreo/ProCap)]
28
+
29
+ ProCap is a framework that reformulates change modeling from static image comparison to dynamic procedure modeling. It features a two-stage design:
30
+ 1. **Explicit Procedure Modeling**: Trains a procedure encoder to learn the change procedure from a sparse set of keyframes.
31
+ 2. **Implicit Procedure Captioning**: Integrates the trained encoder within an encoder-decoder model for captioning using learnable procedure queries.
32
+
33
  It provides **processed datasets**, **pre-trained model weights**, and **evaluation tools** for reproducing the results reported in the paper.
34
 
35
  📦 All materials are also available via [Baidu Netdisk](https://pan.baidu.com/s/1t_YXB6J_vkuPxByn2hat2A)