Scatter/Gather: a cluster-based approach to browsing large document collections
SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
Enhanced hypertext categorization using hyperlinks
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Evaluating contents-link coupled web page clustering for web search results
Proceedings of the eleventh international conference on Information and knowledge management
Hypertext Categorization using Hyperlink Patterns and Meta Data
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Graph-based text classification: learn from your neighbors
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
A neighborhood-based approach for clustering of linked document collections
CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
A novel feature selection algorithm for text categorization
Expert Systems with Applications: An International Journal
An Iterative Hybrid Filter-Wrapper Approach to Feature Selection for Document Clustering
Canadian AI '09 Proceedings of the 22nd Canadian Conference on Artificial Intelligence: Advances in Artificial Intelligence
Web Search Clustering and Labeling with Hidden Topics
ACM Transactions on Asian Language Information Processing (TALIP)
Boilerplate detection using shallow text features
Proceedings of the third ACM international conference on Web search and data mining
K-means based approaches to clustering nodes in annotated graphs
ISMIS'11 Proceedings of the 19th international conference on Foundations of intelligent systems
Hi-index | 0.00 |
Web page clustering is a fundamental technique to offer a solution for data management, information locating and its interpretation of Web data and to facilitate users for navigation, discrimination and understanding. Most existing clustering algorithms can't adapt well to Web page clustering directly in terms of efficiency and effectiveness due to the problems of high dimensionality and data sparseness. Furthermore, the uncontrolled nature of web content presents additional challenges to web page clustering, whereas the interconnected characteristic of hypertext can provide useful information for the process. To address this problem, we propose a new Web page clustering method with combining neighbors' content to overcome data sparseness and using Iterative Feature Selection to remove noisy and redundant features and to improve the performance of clustering algorithm. Experimental results show that the proposed method significantly improves the performance of the Vietnamese web page clustering with a relatively small number of good descriptive features for web pages.