Automatically Mining Result Records from Search Engine Response Pages

Authors:
Dheerendranath Mundluru;Jayasimha Reddy Katukuri;Saygin Celebi
Affiliations:
University of Louisiana at Lafayette;University of Louisiana at Lafayette;University of Louisiana at Lafayette
Venue:
ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Year:
2005

Citing 5
Cited 0

Approximate String Matching

ACM Computing Surveys (CSUR)
IEPAD: information extraction based on pattern discovery

Proceedings of the 10th international conference on World Wide Web
SE-LEGO: creating metasearch engines on demand

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Mining data records in Web pages

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Editorial: special issue on web content mining

ACM SIGKDD Explorations Newsletter

Quantified Score

Hi-index	0.00

Visualization

Abstract

Usually, Web applications such as deep Web crawlers, metasearch engines, and other Web mining systems need to extract information displayed in the form of result records on response pages returned by search engines in response to submitted queries. Extracting such records is challenging as search engines are heterogeneous in displaying their records. In addition, response pages returned by many search engines include other noisy content such as advertisements, suggestion links, etc., which make the extraction task even more complicated. In this paper, we propose a highly effective and efficient algorithm for automatically mining result records from search engine response pages.