Development of an agent system to collect schedule information on the web for intermodal transportation network planning

Authors:
Hyung Rim Choi;Hyun Soo Kim;Nam Kyu Park;Byung Joo Park;Moo Hong Kang;Jae Un Jeung
Affiliations:
Departement of Management Information Systems, Dong-a University, Saha-gu, Busan, South Korea;Departement of Management Information Systems, Dong-a University, Saha-gu, Busan, South Korea;Department of Distribution Management, Tongmyong University, Nam-gu, Busan, South Korea;Departement of Management Information Systems, Dong-a University, Saha-gu, Busan, South Korea;Departement of Management Information Systems, Dong-a University, Saha-gu, Busan, South Korea;Departement of Management Information Systems, Dong-a University, Saha-gu, Busan, South Korea
Venue:
CEA'07 Proceedings of the 2007 annual Conference on International Conference on Computer Engineering and Applications
Year:
2007

Citing 5
Cited 0

Efficient crawling through URL ordering

WWW7 Proceedings of the seventh international conference on World Wide Web 7
A Topic-Specific Web Robot Model Based on Restless Bandits

IEEE Internet Computing
CoBWeb A Crawler for the Brazilian Web

SPIRE '99 Proceedings of the String Processing and Information Retrieval Symposium & International Workshop on Groupware
Design and Implementation of a High-Performance Distributed Web Crawler

ICDE '02 Proceedings of the 18th International Conference on Data Engineering
UbiCrawler: a scalable fully distributed web crawler

Software—Practice & Experience

Quantified Score

Hi-index	0.00

Visualization

Abstract

Logistics cost-cutting using a new transportation network design has recently emerged as a big issue for logistics-related firms. However, many difficulties exist in designing an intermodal transportation network, because the operation schedule information necessary for the transportation network design has been scattered. To solve this problem, a technology that can collect and provide the schedule data of all the transportation modes is needed. In this research, an agent system that can identify and collect schedule information provided through the Internet was developed. The schedule-collecting agent system developed in this research is divided into the following: Schedule Crawler, which extracts URLs (Uniform Resource Locators) from HTML (Hyper-Text Markup Language) documents, finds HTML pages, and identifies a schedule-providing page. Web Robot, which senses data change from the identified schedule providing page, extracts schedule data, and saves the extracted data in the schedule database. In addition, algorithm and heuristics that can perceive only schedule information from the concerned Web page were developed. To compare the performance of the system, an experiment was carried out using four shipping companies, and 135 shipping schedule information-providing HTML pages. As a result, a 99.8% schedule information page identification rate was demonstrated, and a 92.3% schedule data extraction success rate was exhibited in the concerned pages. The extracted schedules can be used as a schedule information system for an intermodal transportation network design by establishing a database of those schedules.