Unsupervised feature weighting based on local feature relatedness

Authors:
Jiali Yun;Liping Jing;Jian Yu;Houkuan Huang
Affiliations:
School of Computer and Information Technology, Beijing Jiaotong University, China;School of Computer and Information Technology, Beijing Jiaotong University, China;School of Computer and Information Technology, Beijing Jiaotong University, China;School of Computer and Information Technology, Beijing Jiaotong University, China
Venue:
PAKDD'11 Proceedings of the 15th Pacific-Asia conference on Advances in knowledge discovery and data mining - Volume Part I
Year:
2011

Citing 10
Cited 0

Term-weighting approaches in automatic text retrieval

Information Processing and Management: an International Journal
Document clustering using word clusters via the information bottleneck method

SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Text Mining Handbook: Advanced Approaches in Analyzing Unstructured Data

Text Mining Handbook: Advanced Approaches in Analyzing Unstructured Data
Wikify!: linking documents to encyclopedic knowledge

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Building semantic kernels for text classification using wikipedia

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Improving Text Classification by Using Encyclopedia Knowledge

ICDM '07 Proceedings of the 2007 Seventh IEEE International Conference on Data Mining
Clustering Documents with Active Learning Using Wikipedia

ICDM '08 Proceedings of the 2008 Eighth IEEE International Conference on Data Mining
Supervised and Traditional Term Weighting Methods for Automatic Text Categorization

IEEE Transactions on Pattern Analysis and Machine Intelligence
Clustering Documents Using a Wikipedia-Based Concept Representation

PAKDD '09 Proceedings of the 13th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
Exploiting Wikipedia as external knowledge for document clustering

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

Feature weighting plays an important role in text clustering. Traditional feature weighting is determined by the syntactic relationship between feature and document (e.g. TF-IDF). In this paper, a semantically enriched feature weighting approach is proposed by introducing the semantic relationship between feature and document, which is implemented by taking account of the local feature relatedness -- the relatedness between feature and its contextual features within each individual document. Feature relatedness is measured by two methods, document collection-based implicit relatedness measure and Wikipedia link-based explicit relatedness measure. Experimental results on benchmark data sets show that the new feature weighting approach surpasses traditional syntactic feature weighting. Moreover, clustering quality can be further improved by linearly combining the syntactic and semantic factors. The new feature weighting approach is also compared with two existing feature relatedness-based approaches which consider the global feature relatedness (feature relatedness in the entire feature space) and the interdocument feature relatedness (feature relatedness between different documents) respectively. In the experiments, the new feature weighting approach outperforms these two related work in clustering quality and costs much less computational complexity.