Gaming for Boundary: Elastic Localization for Frame-Supervised Video Moment Retrieval

Hao Liu1  Yupeng Hu1โœ‰  Kun Wang1  Yinwei Wei1  Liqiang Nie2

1School of Software, Shandong University, Jinan, China
2School of Computer Science and Technology, Harbin Institute of Technology (Shenzhen), Shenzhen, China

This is the official PyTorch implementation of GOAL, a frame-supervised Video Moment Retrieval (VMR) framework for elastic boundary localization via a game-based paradigm and Dynamic Updating Technique (DUT).

๐Ÿ”— Paper: SIGIR 2025 ๐Ÿ”— GitHub Repository: iLearn-Lab/SIGIR25-GOAL


Model Information

1. Model Name

GOAL (Gaming fOr elAstic Localization).

2. Task Type & Applicable Tasks

  • Task Type: Frame-Supervised Video Moment Retrieval (VMR) / Temporal Localization / Vision-Language Learning
  • Applicable Tasks: Retrieving the temporal moment in a video that matches a natural language query using a single annotated frame, with a focus on ambiguous temporal boundary localization.

3. Project Introduction

Frame-supervised Video Moment Retrieval (VMR) aims to retrieve the temporal moment in a video that matches a natural language query using only a single annotated frame. While this setting reduces annotation cost, it brings severe ambiguity in temporal boundary prediction.

GOAL addresses this challenge through a game-based paradigm with three players, namely KFP, AFP, and BP, together with a Dynamic Updating Technique (DUT) that progressively refines boundary decisions through unilateral and bilateral updates for more elastic localization.

4. Training Data Source

The model is trained and evaluated on standard frame-supervised VMR benchmarks:

  • ActivityNet Captions
  • Charades-STA
  • TACoS

Usage & Basic Inference

This codebase provides training and evaluation scripts for frame-supervised VMR, as well as checkpoints for quick reproduction.

Step 1: Prepare the Environment

Clone the GitHub repository and install dependencies:

git clone https://github.com/iLearn-Lab/SIGIR25-GOAL.git
cd GOAL
python -m venv .venv
source .venv/bin/activate   # Linux / Mac
# .venv\Scripts\activate    # Windows
pip install numpy scipy pyyaml tqdm

Step 2: Download Model Weights & Data

Prepare features and raw annotations following ViGA's dataset preparation protocol.

Before running the code, please check and replace local dataset and feature paths in:

  • src/config.yaml
  • src/utils/utils.py

Step 3: Run Inference

To evaluate a trained experiment folder, run:

python -m src.experiment.eval --exp path/to/your/experiment_folder

Limitations & Notes

Disclaimer: This repository is intended for academic research purposes only.

  • The model requires access to the original benchmark datasets and extracted video features for evaluation.
  • Some configuration files currently contain local path settings and should be updated before use.

Citation

If you find our work useful in your research, please consider citing our paper:

@inproceedings{liu2025gaming,
  title={Gaming for Boundary: Elastic Localization for Frame-Supervised Video Moment Retrieval},
  author={Liu, Hao and Hu, Yupeng and Wang, Kun and Wei, Yinwei and Nie, Liqiang},
  booktitle={Proceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval},
  year={2025},
  doi={10.1145/3726302.3729984}
}

Contact

If you have any questions, feel free to contact me at liuh90210@gmail.com.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support