iASA: learning to annotate the semantic web

  • Authors:
  • Jie Tang;Juanzi Li;Hongjun Lu;Bangyong Liang;Xiaotong Huang;Kehong Wang

  • Affiliations:
  • Department of Computer Science, Tsinghua University, Beijing, P.R. China;Department of Computer Science, Tsinghua University, Beijing, P.R. China;Department of Computer Science, Tsinghua University, Beijing, P.R. China;Department of Computer Science, Tsinghua University, Beijing, P.R. China;Department of Computer Science, Tsinghua University, Beijing, P.R. China;Department of Computer Science, Tsinghua University, Beijing, P.R. China

  • Venue:
  • Journal on Data Semantics IV
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

With the advent of the Semantic Web, there is a great need to upgrade existing web content to semantic web content. This can be accomplished through semantic annotations. Unfortunately, manual annotation is tedious, time consuming and error-prone. In this paper, we propose a tool, called iASA, that learns to automatically annotate web documents according to an ontology. iASA is based on the combination of information extraction (specifically, the Similarity-based Rule Learner—SRL) and machine learning techniques. Using linguistic knowledge and optimal dynamic window size, SRL produces annotation rules of better quality than comparable semantic annotation systems. Similarity-based learning efficiently reduces the search space by avoiding pseudo rule generalization. In the annotation phase, iASA exploits ontology knowledge to refine the annotation it proposes. Moreover, our annotation algorithm exploits machine learning methods to correctly select instances and to predict missing instances. Finally, iASA provides an explanation component that explains the nature of the learner and annotator to the user. Explanations can greatly help users understand the rule induction and annotation process, so that they can focus on correcting rules and annotations quickly. Experimental results show that iASA can reach high accuracy quickly.