Web site usability: a designer's guide
Web site usability: a designer's guide
Template detection via data mining and its applications
Proceedings of the 11th international conference on World Wide Web
Computer Vision
Detecting web page structure for adaptive viewing on small form factor devices
WWW '03 Proceedings of the 12th international conference on World Wide Web
Cluster ensembles --- a knowledge reuse framework for combining multiple partitions
The Journal of Machine Learning Research
Eliminating noisy information in Web pages for data mining
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Multi-paragraph segmentation of expository text
ACL '94 Proceedings of the 32nd annual meeting on Association for Computational Linguistics
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
WISDOM: Web Intrapage Informative Structure Mining Based on Document Object Model
IEEE Transactions on Knowledge and Data Engineering
The volume and evolution of web page templates
WWW '05 Special interest tracks and posters of the 14th international conference on World Wide Web
Automatic Identification of Informative Sections of Web Pages
IEEE Transactions on Knowledge and Data Engineering
Proceedings of the 15th international conference on World Wide Web
A fast and robust method for web page template detection and removal
CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Page-level template detection via isotonic smoothing
Proceedings of the 16th international conference on World Wide Web
ICDAR '07 Proceedings of the Ninth International Conference on Document Analysis and Recognition - Volume 02
Computing block importance for searching on web sites
Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
A graph-theoretic approach to webpage segmentation
Proceedings of the 17th international conference on World Wide Web
Extracting content structure for web pages based on visual representation
APWeb'03 Proceedings of the 5th Asia-Pacific web conference on Web technologies and applications
Webpage segmentation for extracting images and their surrounding contextual information
MM '09 Proceedings of the 17th ACM international conference on Multimedia
Boilerplate detection using shallow text features
Proceedings of the third ACM international conference on Web search and data mining
Document structure meets page layout: loopy random fields for web news content extraction
Proceedings of the 10th ACM symposium on Document engineering
Evaluating the visual quality of web pages using a computational aesthetic approach
Proceedings of the fourth ACM international conference on Web search and data mining
Page segmentation by web content clustering
Proceedings of the International Conference on Web Intelligence, Mining and Semantics
A site oriented method for segmenting web pages
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
A proposal for the evaluation of adaptive content retrieval, modification and delivery
Proceedings of the First Workshop on Personalised Multilingual Hypertext Retrieval
VisHue: web page segmentation for an improved query interface for medlineplus medical encyclopedia
DNIS'11 Proceedings of the 7th international conference on Databases in Networked Information Systems
Slicepedia: providing customized reuse of open-web resources for adaptive hypermedia
Proceedings of the 23rd ACM conference on Hypertext and social media
Extracting informative textual parts from web pages containing user-generated content
Proceedings of the 12th International Conference on Knowledge Management and Knowledge Technologies
An evaluation and enhancement of densitometric fragmentation for content slicing reuse
Proceedings of the 21st ACM international conference on Information and knowledge management
Slicepedia: automating the production of educational resources from open corpus content
EC-TEL'12 Proceedings of the 7th European conference on Technology Enhanced Learning
Slicepedia: towards long tail resource production through open corpus reuse
ICWL'12 Proceedings of the 11th international conference on Advances in Web-Based Learning
Measuring the Visual Complexities of Web Pages
ACM Transactions on the Web (TWEB)
A hybrid approach for extracting informative content from web pages
Information Processing and Management: an International Journal
Structured positional entity language model for enterprise entity retrieval
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Heuristic role detection of visual elements of web pages
ICWE'13 Proceedings of the 13th international conference on Web Engineering
Hi-index | 0.00 |
Web Page segmentation is a crucial step for many applications in Information Retrieval, such as text classification, de-duplication and full-text search. In this paper we describe a new approach to segment HTML pages, building on methods from Quantitative Linguistics and strategies borrowed from the area of Computer Vision. We utilize the notion of text-density as a measure to identify the individual text segments of a web page, reducing the problem to solving a 1D-partitioning task. The distribution of segment-level text density seems to follow a negative hypergeometric distribution, described by Frumkina's Law. Our extensive evaluation confirms the validity and quality of our approach and its applicability to the Web.