A Structured Approach to Data Reverse Engineering of Web Applications

Authors:
Roberto Virgilio;Riccardo Torlone
Affiliations:
Università Roma Tre, Italy;Università Roma Tre, Italy
Venue:
ICWE '9 Proceedings of the 9th International Conference on Web Engineering
Year:
2009

Citing 13
Cited 3

A brief survey of web data extraction tools

ACM SIGMOD Record
Understanding and Restructuring Web Sites with ReWeb

IEEE MultiMedia
Reverse Engineering and Design Recovery: A Taxonomy

IEEE Software
Visual Web Information Extraction with Lixto

Proceedings of the 27th International Conference on Very Large Data Bases
Flexible Reverse Engineering of Web Pages with VAQUISTA

WCRE '01 Proceedings of the Eighth Working Conference on Reverse Engineering (WCRE'01)
Reverse Software Engineering with UML for Web Site Maintenance

WISE '00 Proceedings of the First International Conference on Web Information Systems Engineering (WISE'00)-Volume 2 - Volume 2
LZW Based Compressed Pattern Matching

DCC '04 Proceedings of the Conference on Data Compression
Reverse engineering web applications: the WARE approach

Journal of Software Maintenance and Evolution: Research and Practice - Special issue: Web site evolution
Clustering web pages based on their structure

Data & Knowledge Engineering - Special issue: WIDM 2003
Acquiring owl ontologies from data-intensive web sites

ICWE '06 Proceedings of the 6th international conference on Web engineering
Adapting Web information extraction knowledge via mining site-invariant and site-dependent features

ACM Transactions on Internet Technology (TOIT)
A Meta-model Approach to the Management of Hypertexts in Web Information Systems

ER '08 Proceedings of the ER 2008 Workshops (CMLSA, ECDM, FP-UML, M2AS, RIGiM, SeCoGIS, WISM) on Advances in Conceptual Modeling: Challenges and Opportunities
Extracting content structure for web pages based on visual representation

APWeb'03 Proceedings of the 5th Asia-Pacific web conference on Web technologies and applications

Automatic web page annotation with google rich snippets

OTM'10 Proceedings of the 2010 international conference on On the move to meaningful internet systems: Part II
RDFa based annotation of web pages through keyphrases extraction

OTM'11 Proceedings of the 2011th Confederated international conference on On the move to meaningful internet systems - Volume Part II
A reverse engineering approach for automatic annotation of Web pages

Multimedia Tools and Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

The majority of documents on the Web are written in HTML, constituting a huge amount of legacy data: all documents are formatted for visual purposes only and with different styles due to diverse authorships and goals and this makes the process of retrieval and integration of Web contents difficult to automate. We provide a contribution to the solution of this problem by proposing a structured approach to data reverse engineering of data-intensive Web sites. We focus on data content and on the way in which such content is structured on the Web. We profitably use a Web data model to describe abstract structural features of HTML pages and propose a method for the segmentation of HTML documents in special blocks grouping semantically related Web objects. We have developed a tool based on this method that supports the identification of structure, function, and meaning of data organized in Web object blocks. We demonstrate with this tool the feasibility and effectiveness of our approach over a set of real Web sites.