Concluding pattern of web page based on string pattern matching

  • Authors:
  • Yiqing Cai;Xinjun Wang;Chunsheng Lu;Zhongmin Yan;Zhaohui Peng

  • Affiliations:
  • School of Computer Science and Technology, Shandong University, Jinan, China;School of Computer Science and Technology, Shandong University, Jinan, China;Information Center of Ministry of Human Resources and Social Security of the People's Republic of China;School of Computer Science and Technology, Shandong University, Jinan, China;School of Computer Science and Technology, Shandong University, Jinan, China

  • Venue:
  • WISM'11 Proceedings of the 2011 international conference on Web information systems and mining - Volume Part II
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Presently, each Web site has its own topics and formats to arrange the page structure and present information. Therefore, there is a great need for value-added service that extracts information from multiple sources. Data extraction from HTML is usually performed by software modules called wrappers. In many studies of constructing wrapper, concluding the pattern of the Web site is a importance task in the beginning. This paper studies the problem of concluding pattern from a Web page that contains several nested structure and repeated structure. In our method, the algorithm bases on string pattern matching can discover the nested structure and the repeated structure in a Web page. Then a regular expression will be generated as the pattern of the Web site.