The volume and evolution of web page templates
WWW '05 Special interest tracks and posters of the 14th international conference on World Wide Web
Boilerplate detection using shallow text features
Proceedings of the third ACM international conference on Web search and data mining
A hybrid approach for extracting informative content from web pages
Information Processing and Management: an International Journal
Hi-index | 0.00 |
What makes template content in the Web so special that we need to remove it? In this paper I present a large-scale aggregate analysis of textual Web content, corroborating statistical laws from the field of Quantitative Linguistics. I analyze the idiosyncrasy of template content compared to regular "full text" content and derive a simple yet suitable quantitative model.