Mining semantic relationships between concepts across documents incorporating wikipedia knowledge

  • Authors:
  • Peng Yan;Wei Jin

  • Affiliations:
  • Department of Computer Science, North Dakota State University, Fargo, ND;Department of Computer Science, North Dakota State University, Fargo, ND

  • Venue:
  • ICDM'13 Proceedings of the 13th international conference on Advances in Data Mining: applications and theoretical aspects
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

The ongoing astounding growth of text data has created an enormous need for fast and efficient text mining algorithms. Traditional approaches for document representation are mostly based on the Bag of Words (BOW) model which takes a document as an unordered collection of words. However, when applied in fine-grained information discovery tasks, such as mining semantic relationships between concepts, sorely relying on the BOW representation may not be sufficient to identify all potential relationships since the resulting associations based on the BOW approach are limited to the concepts that appear in the document collection literally. In this paper, we attempt to complement existing information in the corpus by proposing a new hybrid approach, which mines semantic associations between concepts across multiple text units through incorporating extensive knowledge from Wikipedia. The experimental evaluation demonstrates that search performance has been significantly enhanced in terms of accuracy and coverage compared with a purely BOW-based approach and alternative solutions where only the article contents of Wikipedia or category information are considered.