EShopMonitor: A Web Content Monitoring Tool

  • Authors:
  • Neeraj Agrawal;Rema Ananthanarayanan;Rahul Gupta;Sachindra Joshi;Raghu Krishnapuram;Sumit Negi

  • Affiliations:
  • -;-;-;-;-;-

  • Venue:
  • ICDE '04 Proceedings of the 20th International Conference on Data Engineering
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

Data presented on commerce sites runs into thousandsof pages, and is typically delivered from multiple back-endsources. This makes it difficult to identify incorrect, anomalous,or interesting data such as $9.99 air fares, missinglinks, drastic changes in prices and addition of new productsor promotions. In this paper, we describe a systemthat monitors Websites automatically and generates varioustypes of reports so that the content of the site can be monitoredand the quality maintained. The solution designedand implemented by us consists of a site crawler that crawlsdynamic pages, an information miner that learns to extractuseful information from the pages based on examples providedby the user, and a reporter that can be configured bythe user to answer specific queries. The tool can also beused for identifying price trends and new products or promotionsat competitor sites. A pilot run of the tool has beensuccessfully completed at the ibm.com site.