An Approach to Identify Duplicated Web Pages

Authors:
Giuseppe A. Di Lucca;Massimiliano Di Penta;Anna Rita Fasolino
Affiliations:
-;-;-
Venue:
COMPSAC '02 Proceedings of the 26th International Computer Software and Applications Conference on Prolonging Software Life: Development and Redevelopment
Year:
2002

Citing 0
Cited 22

Duplicate detection in click streams

WWW '05 Proceedings of the 14th international conference on World Wide Web
Recovering conceptual models from web applications

SIGDOC '06 Proceedings of the 24th annual ACM international conference on Design of communication
Comparison and Evaluation of Clone Detection Tools

IEEE Transactions on Software Engineering
Improving Web site understanding with keyword-based clustering

Journal of Software Maintenance and Evolution: Research and Practice
Empirical evaluation of clone detection using syntax suffix trees

Empirical Software Engineering
A Visual Technique for Web Pages Comparison

Electronic Notes in Theoretical Computer Science (ENTCS)
An evaluation of code similarity identification for the grow-and-prune model

Journal of Software Maintenance and Evolution: Research and Practice - Special Issue on the 12th Conference on Software Maintenance and Reengineering (CSMR 2008)
Comparison and evaluation of code clone detection techniques and tools: A qualitative approach

Science of Computer Programming
Partial Similarity of Objects, or How to Compare a Centaur to a Horse

International Journal of Computer Vision
A Model Checking-based Method for Verifying Web Application Design

Electronic Notes in Theoretical Computer Science (ENTCS)
WAVer: A Model Checking-based Tool to Verify Web Application Design

Electronic Notes in Theoretical Computer Science (ENTCS)
Comparing clustering algorithms for the identification of similar pages in web applications

ICWE'07 Proceedings of the 7th international conference on Web engineering
Perturbation-based user-input-validation testing of web applications

Journal of Systems and Software
Web content outlier mining through mathematical approach and trust rating

ACACOS'11 Proceedings of the 10th WSEAS international conference on Applied computer and applied computational science
Analyzing web service similarity using contextual clones

Proceedings of the 5th International Workshop on Software Clones
Identifying cloned navigational patterns in web applications

Journal of Web Engineering
An investigation of clustering algorithms in the identification of similar web pages

Journal of Web Engineering
Design verification of web applications using symbolic model checking

ICWE'05 Proceedings of the 5th international conference on Web Engineering
An investigation of cloning in web applications

ICWE'05 Proceedings of the 5th international conference on Web Engineering
A sentence-based copy detection approach for web documents

FSKD'05 Proceedings of the Second international conference on Fuzzy Systems and Knowledge Discovery - Volume Part I
Augmenting test suites effectiveness by increasing output diversity

Proceedings of the 34th International Conference on Software Engineering
Detecting source code similarity using code abstraction

Proceedings of the 7th International Conference on Ubiquitous Information Management and Communication

Quantified Score

Hi-index	0.00

Visualization

Abstract

A relevant consequence of the unceasing expansion of the Web and e-commerce is the growth of the demand ofnew Web sites and Web applications. As a result, Web sites and applications are usually developed without a fomlalized process, but Web pages are directly coded in an incremental way, where new pages are obtained by duplicating existing ones. Duplicated Web pages, having the same structure and just differing for the data they include, can be considered as clones. The identification of clones may reduce the effort devoted to test, maintain and evolve Web sites and applications. Moreover, clone detection among different Web sites aims to detect cases of possible plagiarism. In this paper we propose an approach, based on similarity metrics, to detect duplicated pages in Web sites and applications, implemented with HTML language and ASP technology. The proposed approach has been assessed by analyzing several Web sites and Web applications. The obtained results are reported in the paper with respect to some case studies.