Content integration for e-business
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Visual Web Information Extraction with Lixto
Proceedings of the 27th International Conference on Very Large Data Bases
XWRAP: An XML-Enabled Wrapper Construction System for Web Information Sources
ICDE '00 Proceedings of the 16th International Conference on Data Engineering
A bag of paths model for measuring structural similarity in Web documents
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
The eShopmonitor: a comprehensive data extraction tool for monitoring web sites
IBM Journal of Research and Development
Hi-index | 0.00 |
Data presented on commerce sites runs into thousandsof pages, and is typically delivered from multiple back-endsources. This makes it difficult to identify incorrect, anomalous,or interesting data such as $9.99 air fares, missinglinks, drastic changes in prices and addition of new productsor promotions. In this paper, we describe a systemthat monitors Websites automatically and generates varioustypes of reports so that the content of the site can be monitoredand the quality maintained. The solution designedand implemented by us consists of a site crawler that crawlsdynamic pages, an information miner that learns to extractuseful information from the pages based on examples providedby the user, and a reporter that can be configured bythe user to answer specific queries. The tool can also beused for identifying price trends and new products or promotionsat competitor sites. A pilot run of the tool has beensuccessfully completed at the ibm.com site.