XML-aided phrase indexing for hypertext documents

  • Authors:
  • Miro Lehtonen;Antoine Doucet

  • Affiliations:
  • University of Helsinki, Helsinki, Finland;University of Caen, Caen, France

  • Venue:
  • Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
  • Year:
  • 2008

Quantified Score

Hi-index 0.01

Visualization

Abstract

We combine techniques of XML Mining and Text Mining for the benefit of Information Retrieval. By manipulating the word sequence according to the XML structure of the marked-up text, we strengthen phrase boundaries so that they are more obvious to the algorithms that extract multiword sequences from text. Consequently, the quality of the indexed phrases improves, which has a positive effect on the average precision measured by the INEX 2007 standards.