On foreign name search

  • Authors:
  • Jason Soo;Ophir Frieder

  • Affiliations:
  • Information Retrieval Laboratory, Illinois Institute of Technology;Department of Computer Science, Georgetown University

  • Venue:
  • ECIR'2010 Proceedings of the 32nd European conference on Advances in Information Retrieval
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

We address foreign name search in a highly diverse user community. User sophistication ranges from highly experienced archivists to apprehensive users who shy away from technology; apprehensive users dominate system use. Thus, all system interfaces must assume minimal dependency on the user. Our foreign names search approach, called Segments, is language independent; thus, there is no need to determine the language of origin from the diverse candidate set of thirteen languages. We compare Segments against traditional n-gram and Soundex based solutions. Actual and synthetic queries are used to search a names data set resident in the United States Holocaust Memorial Museum. We also search a subset of the 1990 United States Census Bureau Surnames data set to evaluate the performance of Segments on a predominately language specific (English) collection. Our results demonstrate statistically significant performance gains over both traditional approaches. The described approach supports search efforts at the United States Holocaust Memorial Museum.