Indexing emails and email threads for retrieval

  • Authors:
  • Yejun Wu;Douglas W. Oard

  • Affiliations:
  • University of Maryland, College Park, MD;University of Maryland, College Park, MD

  • Venue:
  • Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

Electronic mail poses a number of unusual challenges for the design of information retrieval systems and test collections, including informal expression, conversational structure, variable document granularity (e.g., messages, threads, or longer-term interactions), a naturally occuring integration between free text and structural metadata, and incompletely characterized user needs. This paper reports on initial experiments with a large collection of public mailing lists from the World Wide Web consortium that will be used for the TREC 2005 Enterprise Search Track. Automatic subject-line threading and removal of duplicated text were found to have little effect in a small pilot study. Those observations motivated development of a question typology and more detailed analysis of collection characteristics; preliminary results for both are reported.