Mining semantic relationships between concepts across documents incorporating wikipedia knowledge

Authors:
Peng Yan;Wei Jin
Affiliations:
Department of Computer Science, North Dakota State University, Fargo, ND;Department of Computer Science, North Dakota State University, Fargo, ND
Venue:
ICDM'13 Proceedings of the 13th international conference on Advances in Data Mining: applications and theoretical aspects
Year:
2013

Citing 10
Cited 0

Ontologies Improve Text Document Clustering

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Text mining: generating hypotheses from MEDLINE

Journal of the American Society for Information Science and Technology
Unapparent information revelation: a concept chain graph approach

Proceedings of the 14th ACM international conference on Information and knowledge management
InfoXtract: a customizable intermediate level information extraction engine

SEALTS '03 Proceedings of the HLT-NAACL 2003 workshop on Software engineering and architecture of language technology systems - Volume 8
Measuring semantic similarity between words using web search engines

Proceedings of the 16th international conference on World Wide Web
Knowledge Discovery across Documents through Concept Chain Queries

ICDMW '06 Proceedings of the Sixth IEEE International Conference on Data Mining - Workshops
Improving Knowledge Discovery in Document Collections through Combining Text Retrieval and Link Analysis Techniques

ICDM '07 Proceedings of the 2007 Seventh IEEE International Conference on Data Mining
Overcoming the brittleness bottleneck using wikipedia: enhancing text categorization with encyclopedic knowledge

AAAI'06 proceedings of the 21st national conference on Artificial intelligence - Volume 2
Computing semantic relatedness using Wikipedia-based explicit semantic analysis

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Improving cross-document knowledge discovery using explicit semantic analysis

DaWaK'12 Proceedings of the 14th international conference on Data Warehousing and Knowledge Discovery

Quantified Score

Hi-index	0.00

Visualization

Abstract

The ongoing astounding growth of text data has created an enormous need for fast and efficient text mining algorithms. Traditional approaches for document representation are mostly based on the Bag of Words (BOW) model which takes a document as an unordered collection of words. However, when applied in fine-grained information discovery tasks, such as mining semantic relationships between concepts, sorely relying on the BOW representation may not be sufficient to identify all potential relationships since the resulting associations based on the BOW approach are limited to the concepts that appear in the document collection literally. In this paper, we attempt to complement existing information in the corpus by proposing a new hybrid approach, which mines semantic associations between concepts across multiple text units through incorporating extensive knowledge from Wikipedia. The experimental evaluation demonstrates that search performance has been significantly enhanced in terms of accuracy and coverage compared with a purely BOW-based approach and alternative solutions where only the article contents of Wikipedia or category information are considered.