Segmentation strategies for passage retrieval in audio-visual documents

  • Authors:
  • Petra Galuščáková

  • Affiliations:
  • Charles University in Prague, Faculty of Mathematics and Physics, Prague, Czech Rep

  • Venue:
  • Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

The importance of Information Retrieval (IR) in audio-visual recordings has been increasing with steeply growing numbers of audio-visual documents available on-line. Compared to traditional IR methods, this task requires specific techniques, such as Passage Retrieval which can accelerate the search process by retrieving the exact relevant passage of a recording instead of the full document. In Passage Retrieval, full recordings are divided into shorter segments which serve as individual documents for the further IR setup. This technique also allows normalizing document length and applying positional information. It was shown that it can even improve retrieval results. In this work, we examine two general strategies for Passage Retrieval: blind segmentation into overlapping regular-length passages and segmentation into variable-length passages based on semantics of their content. Time-based segmentation was already shown to improve retrieval of textual documents and audio-visual recordings. Our experiments performed on the test collection used in the Search subtask of the Search and Hyperlinking Task in MediaEval Benchmarking 2012 confirm those findings and show that parameters (segment length and shift) tuning for a specific test collection can further improve the results. Our best results on this collection were achieved by using 45-second long segments with 15-second shifts. Semantic-based segmentation can be divided into three types: similarity-based (producing segments with high intra-similarity and low inter-similarity), lexical-chain-based (producing segments with frequent lexically connected words), and feature-based (combining various features which signalize a segment break in a machine-learning setting). In this work, we mainly focus on feature-based segmentation which allows exploiting various features from all modalities of the data (including segment length) in a single trainable model and produces segments which can eventually overlap. Our preliminary results show that even simple semantic-based segmentation outperforms regular segmentation. Our model is a decision tree incorporating the following features: shot segments, output of TextTiling algorithm, cue words (well, thanks, so, I, now), sentence breaks, and the length of the silence after the previous word. In terms of the MASP, the relative improvement over regular segmentation is more than 19%.