Business information extraction from semi-structured webpages

  • Authors:
  • Nahk Hyun Sung;Yong Sik Chang

  • Affiliations:
  • Department of Management Information Systems, Yong-In University, 470 Samga-dong, Yongin, Kyungki 449-714, South Korea;Department of e-Business, Hanshin University, 411 Yangsan-dong, Osan, Kyungki 447-791, South Korea

  • Venue:
  • Expert Systems with Applications: An International Journal
  • Year:
  • 2004

Quantified Score

Hi-index 12.05

Visualization

Abstract

To protect online consumers, as OECD Guidelines recommend, Internet shopping malls should provide information about their business on their webpages. In Korea, The Consumer Protection Law in Electronic Commerce, forced Internet shopping malls to provide their business information, so that consumers could easily identify them. Since most Korean Internet shopping malls provide consumers with business information in a semi-structured format on their homepages, a software agent can easily identify them. To investigate automatically the provision of the business information with the Internet shopping malls, this article proposes the methods of gathering URLs of Internet shopping malls, of monitoring alterations of webpages, and of extracting business information. Business information extraction in our research is based on synonyms and indicator words of the attributes. We used inductive learning to raise the efficiency of information extraction. With experiments, we showed the potentialities of our agent system. The average extraction accuracy of our agent system was 89.3%.