Converting the syntactic structures of hierarchical data to their semantic structures

  • Authors:
  • Seung-Jin Lim;Yiu-Kai Ng

  • Affiliations:
  • Brigham Young Univ., Provo, UT;Brigham Young Univ., Provo, UT

  • Venue:
  • Information organization and databases
  • Year:
  • 2000

Quantified Score

Hi-index 0.00

Visualization

Abstract

Most of the existing methods for retrieving sub-page level information in an HTML document either rely on keyword-based searching or assume that the internal structure of the document is known beforehand. These techniques, however, are not suitable for locating hierarchically organized information, especially when the internal structure of the given HTML document D is unknown. We present an approach for inferring information hierarchy at the sub-page level of D. Our approach includes constructing a meta data model of D, called content tree (CT), which captures the hierarchical relationship among the data contents of D. Hierarchical information in D can be retrieved via CT using an existing semistructured query language.