Wikipedia mining for an association web thesaurus construction

  • Authors:
  • Kotaro Nakayama;Takahiro Hara;Shojiro Nishio

  • Affiliations:
  • Dept. of Multimedia Eng., Graduate School of Information Science and Technology, Osaka University, Suita, Osaka, Japan;Dept. of Multimedia Eng., Graduate School of Information Science and Technology, Osaka University, Suita, Osaka, Japan;Dept. of Multimedia Eng., Graduate School of Information Science and Technology, Osaka University, Suita, Osaka, Japan

  • Venue:
  • WISE'07 Proceedings of the 8th international conference on Web information systems engineering
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Wikipedia has become a huge phenomenon on the WWW. As a corpus for knowledge extraction, it has various impressive characteristics such as a huge amount of articles, live updates, a dense link structure, brief link texts and URL identification for concepts. In this paper, we propose an efficient link mining method pfibf (Path Frequency - Inversed Backward link Frequency) and the extension method "forward / backward link weighting (FB weighting)" in order to construct a huge scale association thesaurus. We proved the effectiveness of our proposed methods compared with other conventional methods such as cooccurrence analysis and TF-IDF.