Login Paper Search My Schedule Paper Index Help

My ICIP 2021 Schedule

Note: Your custom schedule will not be saved unless you create a new account or login to an existing account.
  1. Create a login based on your email (takes less than one minute)
  2. Perform 'Paper Search'
  3. Select papers that you desire to save in your personalized schedule
  4. Click on 'My Schedule' to see the current list of selected papers
  5. Click on 'Printable Version' to create a separate window suitable for printing (the header and menu will appear, but will not actually print)

Paper Detail

Paper IDMLR-APPL-IVSMR-3.10
Paper Title Weakly-supervised Moment Retrieval Network for Video Corpus Moment Retrieval
Authors Sunjae Yoon, Dahyun Kim, Ji Woo Hong, Junyeong Kim, Kookhoi Kim, Chang D. Yoo, Korea Advanced Institute of Science and Technology, Republic of Korea
SessionMLR-APPL-IVSMR-3: Machine learning for image and video sensing, modeling and representation 3
LocationArea D
Session Time:Wednesday, 22 September, 14:30 - 16:00
Presentation Time:Wednesday, 22 September, 14:30 - 16:00
Presentation Poster
Topic Applications of Machine Learning: Machine learning for image & video sensing, modeling, and representation
IEEE Xplore Open Preview  Click here to view in IEEE Xplore
Abstract This paper proposes Weakly-supervised Moment Retrieval Network (WMRN) for Video Corpus Moment Retrieval (VCMR), which retrieves pertinent temporal moments related to natural language query in a large video corpus. Previous methods for VCMR require full supervision of temporal boundary information for training, which involves a labor-intensive process of annotating the boundaries in a large number of videos. To leverage this, the proposed WMRN performs VCMR in a weakly-supervised manner, where WMRN is learned without ground-truth labels but only with video and text queries. For weakly-supervised VCMR, WMRN addresses the following two limitations of prior methods: (1) Blurry attention over video features due to redundant video candidate proposals generation, (2) Insufficient learning due to weak supervision only with video-query pairs. To this end, WMRN is based on (1) Text Guided Proposal Generation (TGPG) that effectively generates text guided multi-scale video proposals in the prospective region related to query, and (2) Hard Negative Proposal Sampling (HNPS) that enhances video-language alignment via extracting negative video proposals in positive video sample for contrastive learning. Experimental results show that WMRN achieves state-of-the-art performance on TVR and DiDeMo benchmarks in the weakly-supervised setting. To validate the attainments of proposed components of WMRN, comprehensive ablation studies and qualitative analysis are conducted.