Ontologies Improve Text Document Clustering
ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Text mining: generating hypotheses from MEDLINE
Journal of the American Society for Information Science and Technology
Unapparent information revelation: a concept chain graph approach
Proceedings of the 14th ACM international conference on Information and knowledge management
InfoXtract: a customizable intermediate level information extraction engine
SEALTS '03 Proceedings of the HLT-NAACL 2003 workshop on Software engineering and architecture of language technology systems - Volume 8
Measuring semantic similarity between words using web search engines
Proceedings of the 16th international conference on World Wide Web
Knowledge Discovery across Documents through Concept Chain Queries
ICDMW '06 Proceedings of the Sixth IEEE International Conference on Data Mining - Workshops
ICDM '07 Proceedings of the 2007 Seventh IEEE International Conference on Data Mining
AAAI'06 proceedings of the 21st national conference on Artificial intelligence - Volume 2
Computing semantic relatedness using Wikipedia-based explicit semantic analysis
IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Improving cross-document knowledge discovery using explicit semantic analysis
DaWaK'12 Proceedings of the 14th international conference on Data Warehousing and Knowledge Discovery
Hi-index | 0.00 |
The ongoing astounding growth of text data has created an enormous need for fast and efficient text mining algorithms. Traditional approaches for document representation are mostly based on the Bag of Words (BOW) model which takes a document as an unordered collection of words. However, when applied in fine-grained information discovery tasks, such as mining semantic relationships between concepts, sorely relying on the BOW representation may not be sufficient to identify all potential relationships since the resulting associations based on the BOW approach are limited to the concepts that appear in the document collection literally. In this paper, we attempt to complement existing information in the corpus by proposing a new hybrid approach, which mines semantic associations between concepts across multiple text units through incorporating extensive knowledge from Wikipedia. The experimental evaluation demonstrates that search performance has been significantly enhanced in terms of accuracy and coverage compared with a purely BOW-based approach and alternative solutions where only the article contents of Wikipedia or category information are considered.