Data mining: concepts and techniques
Data mining: concepts and techniques
A vector space model for automatic indexing
Communications of the ACM
Using text processing techniques to automatically enrich a domain ontology
Proceedings of the international conference on Formal Ontology in Information Systems - Volume 2001
Machine Learning
Introduction to Modern Information Retrieval
Introduction to Modern Information Retrieval
Noun classification from predicate-argument structures
ACL '90 Proceedings of the 28th annual meeting on Association for Computational Linguistics
Text Classification by Boosting Weak Learners based on Terms and Concepts
ICDM '04 Proceedings of the Fourth IEEE International Conference on Data Mining
Automatic construction of a hypernym-labeled noun hierarchy from text
ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Learning concept hierarchies from text corpora using formal concept analysis
Journal of Artificial Intelligence Research
Subspace clustering of text documents with feature weighting k-means algorithm
PAKDD'05 Proceedings of the 9th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
Word Similarity Based on an Ensemble Model Using Ranking SVMs
WI-IAT '08 Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Volume 03
Hi-index | 0.00 |
In this paper we present a new similarity of text on the basis of combining cosine measure with the quantified conceptual relations by linear interpolation for text clustering. These relations derive from the entries and the words in their definitions in a dictionary, which are quantified under the assumption that the entries and their definitions are equivalent in meaning. This kind of relations is regarded as "knowledge" for text clustering. Under the framework of k-means algorithm, the new interpolated similarity improves the performance of clustering system significantly in terms of optimizing hard and soft criterion functions. Our results show that introducing the conceptual knowledge from the un-structured dictionary into the similarity measure tends to provide potential contributions for text clustering in future.