WIDL: application integration with XML
World Wide Web Journal - Special issue on XML: principles, tools, and techniques
Informia: a mediator for integrated access to heterogeneous information sources
Proceedings of the seventh international conference on Information and knowledge management
WebL - a programming language for the Web
WWW7 Proceedings of the seventh international conference on World Wide Web 7
WebViews: accessing personalized web content and services
Proceedings of the 10th international conference on World Wide Web
Annotea: an open RDF infrastructure for shared Web annotations
Proceedings of the 10th international conference on World Wide Web
IEPAD: information extraction based on pattern discovery
Proceedings of the 10th international conference on World Wide Web
Effective Web data extraction with standard XML technologies
Proceedings of the 10th international conference on World Wide Web
Content integration for e-business
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Data warehousing and business intelligence for e-commerce
Data warehousing and business intelligence for e-commerce
A brief survey of web data extraction tools
ACM SIGMOD Record
Don't Scrap It, Wrap It! A Wrapper Architecture for Legacy Data Sources
VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Detecting web page structure for adaptive viewing on small form factor devices
WWW '03 Proceedings of the 12th international conference on World Wide Web
XWRAP: An XML-Enabled Wrapper Construction System for Web Information Sources
ICDE '00 Proceedings of the 16th International Conference on Data Engineering
The eShopmonitor: a comprehensive data extraction tool for monitoring web sites
IBM Journal of Research and Development
Proceedings of the 15th international conference on World Wide Web
MyPortal: robust extraction and aggregation of web content
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Supporting end-users in the creation of dependable web clips
Proceedings of the 16th international conference on World Wide Web
Exploring websites through contextual facets
Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Usability of GeoWeb sites: case study of Czech regional authorities web sites
BIS'07 Proceedings of the 10th international conference on Business information systems
Hi-index | 0.00 |
Information and content integration are believed to be a possible solution to the problem of information overload in the Internet. The article is an overview of a simple solution for integration of information and content on the Web. Previous approaches to content extraction and integration are discussed, followed by introduction of a novel technology to deal with the problems, based on XML processing. The article includes lessons learned from solving issues of changing webpage layout, incompatibility with HTML standards and multiplicity of the results returned. The method adopting relative XPath queries over DOM tree proves to be more robust than previous approaches to Web information integration. Furthermore, the prototype implementation demonstrates the simplicity that enables non-professional users to easily adopt this approach in their day-to-day information management routines.