An Approach to Identify Duplicated Web Pages
COMPSAC '02 Proceedings of the 26th International Computer Software and Applications Conference on Prolonging Software Life: Development and Redevelopment
Comprehending Web Applications by a Clustering Based Approach
IWPC '02 Proceedings of the 10th International Workshop on Program Comprehension
Using Clustering to Support the Migration from Static to Dynamic Web Pages
IWPC '03 Proceedings of the 11th IEEE International Workshop on Program Comprehension
Restructuring Multilingual Web Sites
ICSM '02 Proceedings of the International Conference on Software Maintenance (ICSM'02)
Pattern Classification (2nd Edition)
Pattern Classification (2nd Edition)
Journal of Software Maintenance and Evolution: Research and Practice - Special issue: Web site evolution
Using a Competitive Clustering Algorithm to Comprehend Web Applications
WSE '06 Proceedings of the Eighth IEEE International Symposium on Web Site Evolution
Identifying cloned navigational patterns in web applications
Journal of Web Engineering
Hi-index | 0.00 |
In this paper, we analyze some widely employed clustering algorithms to identify duplicated or cloned pages in web applications. Indeed, we consider an agglomerative hierarchical clustering algorithm, a divisive clustering algorithm, k-means partitional clustering algorithm, and a partitional competitive clustering algorithm, namely Winner Takes All (WTA). All the clustering algorithms take as input a matrix of the distances between the structures of the web pages. The distance of two pages is computed applying the Levenshtein edit distance to the strings that encode the sequences of HTML tags of the web pages.