Text Document Clustering with Hybrid Feature Selection

Authors:
Asmaa Benghabrit;Bouchra Frikh;Brahim Ouhbi;El Moukhtar Zemmouri;Hicham Behja
Affiliations:
LM2I laboratory, ENSAM, Moulay Ismaïl University, BP 4024 Marjanell, Meknès, Morocco;LTTI laboratory, EST-Fès, Moulay Abdellah University, BP 1796 Atlas Fès, Fès, Morocco;LM2I laboratory, ENSAM, Moulay Ismaïl University, BP 4024 Marjanell, Meknès, Morocco;LM2I laboratory, ENSAM, Moulay Ismaïl University, BP 4024 Marjanell, Meknès, Morocco;LM2I laboratory, ENSAM, Moulay Ismaïl University, BP 4024 MarjaneII, Meknès, Morocco
Venue:
Proceedings of International Conference on Information Integration and Web-based Applications & Services
Year:
2013

Citing 8
Cited 0

Machine learning in automated text categorization

ACM Computing Surveys (CSUR)
A Comparative Study on Feature Selection in Text Categorization

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Empirical and Theoretical Comparisons of Selected Criterion Functions for Document Clustering

Machine Learning
Subspace clustering for high dimensional data: a review

ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Semantic Feature Selection Using WordNet

WI '04 Proceedings of the 2004 IEEE/WIC/ACM International Conference on Web Intelligence
Text Clustering with Feature Selection by Using Statistical Data

IEEE Transactions on Knowledge and Data Engineering
A new feature selection based on comprehensive measurement both in inter-category and intra-category for text categorization

Information Processing and Management: an International Journal
A two-stage feature selection method for text categorization

Computers & Mathematics with Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

Finding the appropriate information and understanding to human research is a delicate task when dealing with an outstanding number of unstructured texts created daily. Hence the objective of clustering algorithms which are part of the powerful text mining tools. In this paper, we propose a novel text document clustering based on a new hybrid feature selection method that we call HFSM. This technique extracts statistical and semantic relevant terms to pilot the clustering mechanism. The experiments conducted on Reuters corpus demonstrate the practical aspects of our algorithm and show that it generates more accurate clustering than the one obtained by other existing algorithms.