Phrase Matching in XML

  • Authors:
  • Sihem Amer-Yahia;Mary Fernández;Divesh Srivastava;Yu Xu

  • Affiliations:
  • AT&T Labs-Research;AT&T Labs-Research;AT&T Labs-Research;UC, San Diego

  • Venue:
  • VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

Phrase matching is a common IR technique to search text and identify relevant documents in a document collection. Phrase matching in XML presents new challenges as text may be interleaved with arbitrary markup, thwarting search techniques that require strict contiguity or close proximity of keywords. We present a technique for phrase matching in XML that permits dynamic specification of both the phrase to be matched and the markup to be ignored. We develop an effective algorithm for our technique that utilizes inverted indices on phrase words and XML tags. We describe experimental results comparing our algorithm to an indexed-nested loop algorithm that illustrate our algorithm's efficiency.