Efficient crawling through URL ordering
WWW7 Proceedings of the seventh international conference on World Wide Web 7
A Topic-Specific Web Robot Model Based on Restless Bandits
IEEE Internet Computing
CoBWeb A Crawler for the Brazilian Web
SPIRE '99 Proceedings of the String Processing and Information Retrieval Symposium & International Workshop on Groupware
Design and Implementation of a High-Performance Distributed Web Crawler
ICDE '02 Proceedings of the 18th International Conference on Data Engineering
UbiCrawler: a scalable fully distributed web crawler
Software—Practice & Experience
Hi-index | 0.00 |
Logistics cost-cutting using a new transportation network design has recently emerged as a big issue for logistics-related firms. However, many difficulties exist in designing an intermodal transportation network, because the operation schedule information necessary for the transportation network design has been scattered. To solve this problem, a technology that can collect and provide the schedule data of all the transportation modes is needed. In this research, an agent system that can identify and collect schedule information provided through the Internet was developed. The schedule-collecting agent system developed in this research is divided into the following: Schedule Crawler, which extracts URLs (Uniform Resource Locators) from HTML (Hyper-Text Markup Language) documents, finds HTML pages, and identifies a schedule-providing page. Web Robot, which senses data change from the identified schedule providing page, extracts schedule data, and saves the extracted data in the schedule database. In addition, algorithm and heuristics that can perceive only schedule information from the concerned Web page were developed. To compare the performance of the system, an experiment was carried out using four shipping companies, and 135 shipping schedule information-providing HTML pages. As a result, a 99.8% schedule information page identification rate was demonstrated, and a 92.3% schedule data extraction success rate was exhibited in the concerned pages. The extracted schedules can be used as a schedule information system for an intermodal transportation network design by establishing a database of those schedules.