Towards a wrapper-driven ontology-based framework for knowledge extraction

  • Authors:
  • Jigui Sun;Xi Bai;Zehai Li;Haiyan Che;Huawen Liu

  • Affiliations:
  • College of Computer Science and Technology, Jilin University, Changchun, China and Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Changchun, China;College of Computer Science and Technology, Jilin University, Changchun, China and Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Changchun, China;College of Computer Science and Technology, Jilin University, Changchun, China and Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Changchun, China;College of Computer Science and Technology, Jilin University, Changchun, China and Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Changchun, China;College of Computer Science and Technology, Jilin University, Changchun, China and Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Changchun, China

  • Venue:
  • KSEM'07 Proceedings of the 2nd international conference on Knowledge science, engineering and management
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Since Web resources are formatted in diverse ways for human viewing, the accuracy of extracting information is not satisfactory and, further, it is not convenient for users to query information extracted by traditional techniques. This paper proposes WebKER, a wrapper-driven system for extracting knowledge from Web pages in Chinese based on domain ontologies. Wrappers are first learned through suffix arrays. Based on HowNet, a novel approach is proposed to automatically align the raw data extracted by wrappers. Then knowledge is generated and described with Resource Description Framework (RDF) statements. After merged, knowledge is finally added to the Knowledge Base (KB). A prototype of WebKER is implemented and in the experiments, the performance of our system and the comparison between querying information stored in the KB and querying information extracted with traditional techniques are given, indicating the superiority of our system. In addition, the evaluation of the outstanding wrapper and the method for merging knowledge are also presented.