A brief survey of web data extraction tools
ACM SIGMOD Record
Automated discovery of search interfaces on the web
ADC '03 Proceedings of the 14th Australasian database conference - Volume 17
ViPER: augmenting automatic information extraction with visual perceptions
Proceedings of the 14th ACM international conference on Information and knowledge management
Extracting data records from the web using tag path clustering
Proceedings of the 18th international conference on World wide web
An approach to assess the quality of web pages in the deep web
DASFAA'11 Proceedings of the 16th international conference on Database systems for advanced applications
Towards a unified solution: data record region detection and segmentation
Proceedings of the 20th ACM international conference on Information and knowledge management
Peer matrix alignment: a new algorithm
PAKDD'12 Proceedings of the 16th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part II
Extracting data records from web using suffix tree
Proceedings of the ACM SIGKDD Workshop on Mining Data Semantics
Clustering visually similar web page elements for structured web data extraction
ICWE'12 Proceedings of the 12th international conference on Web Engineering
Towards web-scale structured web data extraction
Proceedings of the sixth ACM international conference on Web search and data mining
Robust detection of semi-structured web records using a DOM structure-knowledge-driven model
ACM Transactions on the Web (TWEB)
A learning classifier-based approach to aligning data items and labels
BNCOD'13 Proceedings of the 29th British National conference on Big Data
Hi-index | 0.00 |
Search results generated by searchable databases are served dynamically and far larger than the static documents on the Web. These results pages have been referred to as the Deep Web. We need to extract the target data in results pages to integrate them on different searchable databases. We propose a test bed for information extraction from search results. We chose 100 databases randomly from 114,540 pages with search forms. Therefore, these databases have a good variety. We selected 51 databases which include URLs in a results pageand manually identify target information to be extracted. We also suggest evaluation measures for comparing extraction methods and methods for extending the target data.