Repetition-based web page segmentation by detecting tag patterns for small-screen devices

Authors:
J. Kang;J. Yang;J. Choi
Affiliations:
Department of Computer Science and Engineering, Hanyang University;-;-
Venue:
IEEE Transactions on Consumer Electronics
Year:
2010

Citing 0
Cited 3

Bricolage: example-based retargeting for web design

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Extracting informative textual parts from web pages containing user-generated content

Proceedings of the 12th International Conference on Knowledge Management and Knowledge Technologies
Automated information extraction from web APIs documentation

WISE'12 Proceedings of the 13th international conference on Web Information Systems Engineering

Quantified Score

Hi-index	0.43

Visualization

Abstract

Web page segmentation into logical blocks is an important preprocessing step for recognizing informative content blocks in a page that leads to efficient information extraction and convenient display on the devices with smallsized screens. Previous methods for Web page segmentation are not flexible in a dynamic Web environment because they largely relied on heuristic rules generated by exploiting structural tags and visual information inherent in a page. To resolve this problem, this paper proposes a new method of Web page segmentation by recognizing repetitive tag patterns called key patterns in the DOM tree structure of a page. We report on the Repetition-based Page Segmentation (REPS) algorithm, which detects key patterns in a page and generates virtual nodes to correctly segment nested blocks. A series of experiments performed for real Web sites showed that REPS greatly contributes to improving the correctness of Web page segmentation.