The effectiveness and efficiency of agglomerative hierarchic clustering in document retrieval
The effectiveness and efficiency of agglomerative hierarchic clustering in document retrieval
Scatter/Gather: a cluster-based approach to browsing large document collections
SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
Silk from a sow's ear: extracting usable structures from the Web
Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Projections for efficient document clustering
Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval
A vector space model for automatic indexing
Communications of the ACM
Concept decompositions for large sparse text data using clustering
Machine Learning
Co-clustering documents and words using bipartite spectral graph partitioning
Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Information Retrieval
TétraFusion: Information Discovery on the Internet
IEEE Intelligent Systems
Biclustering Algorithms for Biological Data Analysis: A Survey
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Simultaneous clustering: a survey
PReMI'11 Proceedings of the 4th international conference on Pattern recognition and machine intelligence
Hi-index | 0.00 |
With the growth of web-based applications and the increased popularity of the World Wide Web (WWW), the WWW became the greatest source of information available in the world leading to an increased difficulty of extracting relevant information. Moreover, the content of web sites is constantly changing leading to continual changes in Web users' behaviours. Therefore, there is significant interest in analysing web content data to better serve users. Our proposed approach, which is grounded on automatic textual analysis of a web site independently from the usage attempts to define groups of documents dealing with the same topic. Both document clustering and word clustering are well studied problems. However, most existing algorithms cluster documents and words separately but not simultaneously. In this paper, we propose to apply a block clustering algorithm to categorize a web site pages according to their content. We report results of our recent testing of CROKI2 algorithm on a tourist web site.