The anatomy of a large-scale hypertextual Web search engine
WWW7 Proceedings of the seventh international conference on World Wide Web 7
The invisible Web: uncovering information sources search engines can't see
The invisible Web: uncovering information sources search engines can't see
Collecting hidden weeb pages for data extraction
Proceedings of the 4th international workshop on Web information and data management
Proceedings of the 27th International Conference on Very Large Data Bases
Design and Implementation of a High-Performance Distributed Web Crawler
ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Automatic Information Discovery from the "Invisible Web"
ITCC '02 Proceedings of the International Conference on Information Technology: Coding and Computing
The adaptive web
Discovering URLs through user feedback
Proceedings of the 20th ACM international conference on Information and knowledge management
Crawling Ajax-Based Web Applications through Dynamic Analysis of User Interface State Changes
ACM Transactions on the Web (TWEB)
Hidden-Web induced by client-side scripting: an empirical study
ICWE'13 Proceedings of the 13th international conference on Web Engineering
Hi-index | 0.00 |
The way current search engines work leaves a large amount of information available in the World Wide Web outside their catalogues. This is due to the fact that crawlers work by following hyperlinks and a few other references and ignore HTML forms. In this paper, we propose a search engine prototype that can retrieve information behind HTML forms by automatically generating queries for them. We describe the architecture, some implementation details and an experiment that proves that the information is not in fact indexed by current search engines.