Automatic text processing: the transformation, analysis, and retrieval of information by computer
Automatic text processing: the transformation, analysis, and retrieval of information by computer
Elements of information theory
Elements of information theory
HyPursuit: a hierarchical network search engine that exploits content-link hypertext clustering
Proceedings of the the seventh ACM conference on Hypertext
Selection of relevant features and examples in machine learning
Artificial Intelligence - Special issue on relevance
Combining labeled and unlabeled data with co-training
COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
Fast and effective text mining using linear-time document clustering
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Statistical Pattern Recognition: A Review
IEEE Transactions on Pattern Analysis and Machine Intelligence
Analyzing the effectiveness and applicability of co-training
Proceedings of the ninth international conference on Information and knowledge management
Machine learning in automated text categorization
ACM Computing Surveys (CSUR)
A Comparative Study on Feature Selection in Text Categorization
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
ReCoM: reinforcement clustering of multi-type interrelated data objects
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Metasearch: data fusion for document retrieval
Metasearch: data fusion for document retrieval
Web document clustering using hyperlink structures
Computational Statistics & Data Analysis
Synonymous Chinese Transliterations Retrieval from World Wide Web by Using Association Words
ICCS '08 Proceedings of the 8th international conference on Computational Science, Part I
Managing email overload with an automatic nonparametric clustering system
The Journal of Supercomputing
Mining Synonymous Transliterations from the World Wide Web
ACM Transactions on Asian Language Information Processing (TALIP)
Probability-based text clustering algorithm by alternately repeating two operations
Journal of Information Science
Hi-index | 0.00 |
Feature selection has been widely applied in text categorization and clustering. Compared to unsupervised selection, supervised feature selection is more successful in filtering out noise in most cases. However, due to a lack of label information, clustering can hardly exploit supervised selection. Some studies have proposed to solve this problem by "pseudoclass.” As empirical results show, this method is sensitive to selection criteria and data sets. In this paper, we propose a novel feature coselection for Web document clustering, which is called Multitype Features Coselection for Clustering (MFCC). MFCC uses intermediate clustering results in one type of feature space to help the selection in other types of feature spaces. Our experiments show that for most selection criteria, MFCC reduces effectively the noise introduced by "pseudoclass,” and further improves clustering performance.