Annotating wikipedia articles with semantic tags for structured retrieval

  • Authors:
  • Saravadee Sae Tan;Tang Enya Kong;Gian Chand Sodhy

  • Affiliations:
  • Multimedia University, Cyberjaya, Malaysia;Multimedia University, Cyberjaya, Malaysia;Universiti Sains Malaysia, Penang, Malaysia

  • Venue:
  • Proceedings of the 2nd ACM workshop on Social web search and mining
  • Year:
  • 2009

Quantified Score

Hi-index 0.01

Visualization

Abstract

Structured retrieval aims at exploiting the structural information of documents when searching for documents. Structured retrieval makes use of both content and structure of documents to improve information retrieval. Therefore, the availability of semantic structure in the documents is an important factor for the success of structured retrieval. However, the majority of documents in the Web still lack semantically-rich structure. This motivates us to automatically identify the semantic information in web documents and explicitly annotate the information with semantic tags. Based on the well-known Wikipedia corpus, this paper describes an unsupervised learning approach to identify conceptual information and descriptive information of an entity described in a Wikipedia article. Our approach utilizes Wikipedia link structure and Infobox information in order to learn the semantic structure of the Wikipedia articles. We also describe a lazy approach used in the learning process. By utilizing the Wikipedia categories provided by the contributors, only a subset of entities in a Wikipedia category is used as training data in the learning process and the results can be applied to the rest of the entities in the category.