Recognizing Ontology-Applicable Multiple-Record Web Documents

  • Authors:
  • David W. Embley;Yiu-Kai Ng;Li Xu

  • Affiliations:
  • -;-;-

  • Venue:
  • ER '01 Proceedings of the 20th International Conference on Conceptual Modeling: Conceptual Modeling
  • Year:
  • 2001

Quantified Score

Hi-index 0.00

Visualization

Abstract

Automatically recognizing which Web documents are "of interest" for some specified application is non-trivial. As a step toward solving this problem, we propose a technique for recognizing which multiplere-cord Web documents apply to an ontologically specified application. Given the values and kinds of values recognized by an ontological specification in an unstructuredWeb document, we apply three heuristics: (1) a density heuristic that measures the percent of the document that appears to apply to an application ontology, (2) an expected-value heuristic that compares the number and kind of values found in a document to the number and kind expected by the application ontology, and (3) a grouping heuristic that considers whether the values of the document appear to be grouped as application-ontology records. Then, based on machine-learned rules over these heuristic measurements, we determine whether a Web document is applicable for a given ontology. Our experimental results show that we have been able to achieve over 90% for both recall and precision, with an F-measure of about 95%.