The dictionary-based quantified conceptual relations for hard and soft Chinese text clustering

Authors:
Yi Hu;Ruzhan Lu;Yuquan Chen;Hui Liu;Dongyi Zhang
Affiliations:
Department of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai, China;Department of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai, China;Department of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai, China;Department of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai, China;Network Management Center, Politics College of Xi'an, Xi'an, China
Venue:
NLDB'07 Proceedings of the 12th international conference on Applications of Natural Language to Information Systems
Year:
2007

Citing 10
Cited 1

Data mining: concepts and techniques

Data mining: concepts and techniques
A vector space model for automatic indexing

Communications of the ACM
Using text processing techniques to automatically enrich a domain ontology

Proceedings of the international conference on Formal Ontology in Information Systems - Volume 2001
Machine Learning

Machine Learning
Introduction to Modern Information Retrieval

Introduction to Modern Information Retrieval
Noun classification from predicate-argument structures

ACL '90 Proceedings of the 28th annual meeting on Association for Computational Linguistics
Text Classification by Boosting Weak Learners based on Terms and Concepts

ICDM '04 Proceedings of the Fourth IEEE International Conference on Data Mining
Automatic construction of a hypernym-labeled noun hierarchy from text

ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Learning concept hierarchies from text corpora using formal concept analysis

Journal of Artificial Intelligence Research
Subspace clustering of text documents with feature weighting k-means algorithm

PAKDD'05 Proceedings of the 9th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining

Word Similarity Based on an Ensemble Model Using Ranking SVMs

WI-IAT '08 Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Volume 03

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we present a new similarity of text on the basis of combining cosine measure with the quantified conceptual relations by linear interpolation for text clustering. These relations derive from the entries and the words in their definitions in a dictionary, which are quantified under the assumption that the entries and their definitions are equivalent in meaning. This kind of relations is regarded as "knowledge" for text clustering. Under the framework of k-means algorithm, the new interpolated similarity improves the performance of clustering system significantly in terms of optimizing hard and soft criterion functions. Our results show that introducing the conceptual knowledge from the un-structured dictionary into the similarity measure tends to provide potential contributions for text clustering in future.