Algorithms for clustering data
Algorithms for clustering data
Information retrieval
RMM: a methodology for structured hypermedia design
Communications of the ACM
The anatomy of a large-scale hypertextual Web search engine
WWW7 Proceedings of the seventh international conference on World Wide Web 7
Building Web applications with UML
Building Web applications with UML
ACM Computing Surveys (CSUR)
Supporting program comprehension using semantic and structural information
ICSE '01 Proceedings of the 23rd International Conference on Software Engineering
Information Retrieval
Understanding and Restructuring Web Sites with ReWeb
IEEE MultiMedia
An Approach to Identify Duplicated Web Pages
COMPSAC '02 Proceedings of the 26th International Computer Software and Applications Conference on Prolonging Software Life: Development and Redevelopment
On Software Maintenance Process Improvement Based on Code Clone Analysis
PROFES '02 Proceedings of the 4th International Conference on Product Focused Software Process Improvement
Latent Semantic Analysis for German Literature Investigation
Proceedings of the International Conference, 7th Fuzzy Days on Computational Intelligence, Theory and Applications
Measuring Clone Based Reengineering Opportunities
METRICS '99 Proceedings of the 6th International Symposium on Software Metrics
On finding duplication and near-duplication in large software systems
WCRE '95 Proceedings of the Second Working Conference on Reverse Engineering
Using Clustering Algorithms in Legacy Systems Remodularization
WCRE '97 Proceedings of the Fourth Working Conference on Reverse Engineering (WCRE '97)
Experiments with Clustering as a Software Remodularization Method
WCRE '99 Proceedings of the Sixth Working Conference on Reverse Engineering
Reverse Engineering to Achieve Maintainable WWW Sites
WCRE '01 Proceedings of the Eighth Working Conference on Reverse Engineering (WCRE'01)
Clone Detection Using Abstract Syntax Trees
ICSM '98 Proceedings of the International Conference on Software Maintenance
Comprehending Web Applications by a Clustering Based Approach
IWPC '02 Proceedings of the 10th International Workshop on Program Comprehension
Using Clustering to Support the Migration from Static to Dynamic Web Pages
IWPC '03 Proceedings of the 11th IEEE International Workshop on Program Comprehension
IWPC '99 Proceedings of the 7th International Workshop on Program Comprehension
ICSM '02 Proceedings of the International Conference on Software Maintenance (ICSM'02)
Restructuring Multilingual Web Sites
ICSM '02 Proceedings of the International Conference on Software Maintenance (ICSM'02)
Pattern Classification (2nd Edition)
Pattern Classification (2nd Edition)
Journal of Software Maintenance and Evolution: Research and Practice - Special issue: Web site evolution
Reverse engineering web applications: the WARE approach
Journal of Software Maintenance and Evolution: Research and Practice - Special issue: Web site evolution
Semantic clustering: Identifying topics in source code
Information and Software Technology
ACM Transactions on Software Engineering and Methodology (TOSEM)
Journal of Software Maintenance and Evolution: Research and Practice - Web Site Evolution (WSE 2006)
Knowledge discovery in virtual community texts: Clustering virtual communities
Journal of Intelligent & Fuzzy Systems: Applications in Engineering and Technology
Improving Web site understanding with keyword-based clustering
Journal of Software Maintenance and Evolution: Research and Practice
Clustering Algorithms and Latent Semantic Indexing to Identify Similar Pages in Web Applications
WSE '07 Proceedings of the 2007 9th IEEE International Workshop on Web Site Evolution
Function clone detection in web applications: a semiautomated approach
Journal of Web Engineering
Identifying cloned navigational patterns in web applications
Journal of Web Engineering
An investigation of cloning in web applications
ICWE'05 Proceedings of the 5th international conference on Web Engineering
Hi-index | 0.00 |
In this paper we investigate the effect of using clustering algorithms in the reverse engineering field to identify pages that are similar either at the structural level or at the content level. To this end, we have used two instances of a general process that only differ for the measure used to compare web pages. In particular, two web pages at the structural level and at the content level are compared by using the Levenshtein edit distances and Latent Semantic Indexing, respectively. The static pages of two web applications and one static web site have been used to compare the results achieved by using the considered clustering algorithms both at the structural and content level. On these applications we generally achieved comparable results. However, the investigation has also suggested some heuristics to quickly identify the best partition of web pages into clusters among the possible partitions both at the structural and at the content level.