Generating finite-state transducers for semi-structured data extraction from the Web
Information Systems - Special issue on semistructured data
Wrapper induction: efficiency and expressiveness
Artificial Intelligence - Special issue on Intelligent internet systems
A flexible learning system for wrapping tables and lists in HTML documents
Proceedings of the 11th international conference on World Wide Web
Visual Web Information Extraction with Lixto
Proceedings of the 27th International Conference on Very Large Data Bases
The Wargo System: Semi-Automatic Wrapper Generation in Presence of Complex Data Access Modes
DEXA '02 Proceedings of the 13th International Workshop on Database and Expert Systems Applications
WICCAP: From Semi-structured Data to Structured Data
ECBS '04 Proceedings of the 11th IEEE International Conference and Workshop on Engineering of Computer-Based Systems
A Survey of Web Information Extraction Systems
IEEE Transactions on Knowledge and Data Engineering
On the feasibility of geographically distributed web crawling
Proceedings of the 3rd international conference on Scalable information systems
Efficient record-level wrapper induction
Proceedings of the 18th ACM conference on Information and knowledge management
Deep web integrated systems: current achievements and open issues
Proceedings of the 13th International Conference on Information Integration and Web-based Applications and Services
Hi-index | 0.00 |
Size of retrieved data versus crawling time formulate a well-known dilemma in the structured Web crawling community. The real challenge within this dilemma is to optimize the settings of a given wrapper to obtain maximum available data in shortest possible time. In this paper, we try to tune these settings, by introducing a threaded algorithm that guarantees accessing all available detail pages within crawling scope; and using this algorithm, we try to reduce the time consumed by the crawler, via simple adjustments of sleeping time after each detail page visit.