Gaming for Boundary: Elastic Localization for Frame-Supervised Video Moment Retrieval
Hao Liu1 Yupeng Hu1โ Kun Wang1 Yinwei Wei1 Liqiang Nie2
1School of Software, Shandong University, Jinan, China
2School of Computer Science and Technology, Harbin Institute of Technology (Shenzhen), Shenzhen, China
This is the official PyTorch implementation of GOAL, a frame-supervised Video Moment Retrieval (VMR) framework for elastic boundary localization via a game-based paradigm and Dynamic Updating Technique (DUT).
๐ Paper: SIGIR 2025 ๐ GitHub Repository: iLearn-Lab/SIGIR25-GOAL
Model Information
1. Model Name
GOAL (Gaming fOr elAstic Localization).
2. Task Type & Applicable Tasks
- Task Type: Frame-Supervised Video Moment Retrieval (VMR) / Temporal Localization / Vision-Language Learning
- Applicable Tasks: Retrieving the temporal moment in a video that matches a natural language query using a single annotated frame, with a focus on ambiguous temporal boundary localization.
3. Project Introduction
Frame-supervised Video Moment Retrieval (VMR) aims to retrieve the temporal moment in a video that matches a natural language query using only a single annotated frame. While this setting reduces annotation cost, it brings severe ambiguity in temporal boundary prediction.
GOAL addresses this challenge through a game-based paradigm with three players, namely KFP, AFP, and BP, together with a Dynamic Updating Technique (DUT) that progressively refines boundary decisions through unilateral and bilateral updates for more elastic localization.
4. Training Data Source
The model is trained and evaluated on standard frame-supervised VMR benchmarks:
- ActivityNet Captions
- Charades-STA
- TACoS
Usage & Basic Inference
This codebase provides training and evaluation scripts for frame-supervised VMR, as well as checkpoints for quick reproduction.
Step 1: Prepare the Environment
Clone the GitHub repository and install dependencies:
git clone https://github.com/iLearn-Lab/SIGIR25-GOAL.git
cd GOAL
python -m venv .venv
source .venv/bin/activate # Linux / Mac
# .venv\Scripts\activate # Windows
pip install numpy scipy pyyaml tqdm
Step 2: Download Model Weights & Data
Prepare features and raw annotations following ViGA's dataset preparation protocol.
Before running the code, please check and replace local dataset and feature paths in:
src/config.yamlsrc/utils/utils.py
Step 3: Run Inference
To evaluate a trained experiment folder, run:
python -m src.experiment.eval --exp path/to/your/experiment_folder
Limitations & Notes
Disclaimer: This repository is intended for academic research purposes only.
- The model requires access to the original benchmark datasets and extracted video features for evaluation.
- Some configuration files currently contain local path settings and should be updated before use.
Citation
If you find our work useful in your research, please consider citing our paper:
@inproceedings{liu2025gaming,
title={Gaming for Boundary: Elastic Localization for Frame-Supervised Video Moment Retrieval},
author={Liu, Hao and Hu, Yupeng and Wang, Kun and Wei, Yinwei and Nie, Liqiang},
booktitle={Proceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval},
year={2025},
doi={10.1145/3726302.3729984}
}
Contact
If you have any questions, feel free to contact me at liuh90210@gmail.com.