Improving text classification for oral history archives with temporal domain knowledge

  • Authors:
  • J. Scott Olsson;Douglas W. Oard

  • Affiliations:
  • University of Maryland: College Park, College Park, MD;University of Maryland: College Park, College Park, MD

  • Venue:
  • SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper describes two new techniques for increasing the accuracy oftopic label assignment to conversational speech from oral history interviews using supervised machine learning in conjunction with automatic speech recognition. The first, time-shifted classification, leverages local sequence information from the order in which the story is told. The second, temporal label weighting, takes the complementary perspective by using the position within an interview to bias label assignment probabilities. These methods, when used in combination, yield between 6% and 15% relative improvements in classification accuracy using a clipped R-precision measure that models the utility of label sets as segment summaries in interactive speech retrieval applications.