A methodical approach to extracting interesting objects from dynamic web pages
International Journal of Web and Grid Services
Hi-index | 0.00 |
One of the fundamental building blocks in Semantic Web is application generation of wrappers (executable programs), specialized in extracting and annotating interesting information from heterogeneous Web data sources. Most of the wrappers are programmed either manually or by wrapper generation systems supervised by human experts in wrapper production. This dissertation work is dedicated to develop a systematic framework and a suite of system-level facilities for automatic application generation. We present the XWRAP design philosophy, methodology, and the engineering techniques for information extraction application generation. We demonstrate the benefits and unique features of the XWRAP framework and XWRAP techniques for extracting and annotating Web data through three prototype systems: XWRAP Original, XWRAP Elite, and XWRAP Composer. XWRAP technology provides a number of competitive advantages: First, XWRAP Elite can generate wrappers in minutes with code quality and efficiency equivalent to Human experts. Second, XWRAP systems support a variety of transformations of Web data. Third, XWRAP Composer technology is the only wrapper generator to date that is capable of extracting, aggregating, and filtering information from multiple Web pages with workflow dependency. This dissertation also reports the extensive experiments conducted on XWRAP systems, showing the efficiency, trade-offs, and code quality of the XWRAP wrapper applications.