Learning ontology resolution for document representation and its applications in text mining

Authors:
Lidong Bing;Bai Sun;Shan Jiang;Yan Zhang;Wai Lam
Affiliations:
Peking University & The Chinese University of Hong Kong, Beijing, China;Peking University, Beijing, China;Peking University, Beijing, China;Peking University, Beijing, China;The Chinese University of Hong Kong, Hong Kong, Hong Kong
Venue:
CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Year:
2010

Citing 4
Cited 1

WordNet: a lexical database for English

Communications of the ACM
Ontologies Improve Text Document Clustering

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Integration of semantic-based bipartite graph representation and mutual refinement strategy for biomedical literature clustering

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
A new unsupervised method for document clustering by using WordNet lexical and conceptual relations

Information Retrieval

Ontology enhancement and concept granularity learning: keeping yourself current and adaptive

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

It is well known that synonymous and polysemous terms often bring in some noises when calculating the similarity between documents. Existing ontology-based document representation methods are static, hence, the chosen semantic concept set for representing a document has a fixed resolution and it is not adaptable to the characteristics of a document collection and the text mining problem in hand. We propose an Adaptive Concept Resolution (ACR) model to overcome this issue. ACR can learn a concept border from an ontology taking into consideration of the characteristics of a particular document collection. Then this border can provide a tailor-made semantic concept representation for a document coming from the same domain. Another advantage of ACR is that it is applicable in both classification task where the groups are given in the training document set, and clustering task where no group information is available. Furthermore, the result of this model is not sensitive to the model parameter. The experimental results show that ACR outperforms an existing static method significantly.