Efficient extraction of ontologies from domain specific text corpora

  • Authors:
  • Tianyu Li;Pirooz Chubak;Laks V.S. Lakshmanan;Rachel Pottinger

  • Affiliations:
  • University of British Columbia, Vancouver, BC, Canada;University of British Columbia, Vancouver, BC, Canada;University of British Columbia, Vancouver, BC, Canada;University of British Columbia, Vancouver, BC, Canada

  • Venue:
  • Proceedings of the 21st ACM international conference on Information and knowledge management
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Extracting ontological relationships (e.g., ISA and HASA) from free-text repositories (e.g., engineering documents and instruction manuals) can improve users' queries, as well as benefit applications built for these domains. Current methods to extract ontologies from text usually miss many meaningful relationships because they either concentrate on single-word terms and short phrases or neglect syntactic relationships between concepts in sentences. We propose a novel pattern-based algorithm to find ontological relationships between complex concepts by exploiting parsing information to extract multi-word concepts and nested concepts. Our procedure is iterative: we tailor the constrained sequential pattern mining framework to discover new patterns. Our experiments on three real data sets show that our algorithm consistently and significantly outperforms previous representative ontology extraction algorithms.