Some simple effective approximations to the 2-Poisson model for probabilistic weighted retrieval
SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Passage-level evidence in document retrieval
SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Template detection via data mining and its applications
Proceedings of the 11th international conference on World Wide Web
Modern Information Retrieval
Discovering informative content blocks from Web documents
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Cluster ensembles --- a knowledge reuse framework for combining multiple partitions
The Journal of Machine Learning Research
Fine-grain web site structure discovery
WIDM '03 Proceedings of the 5th ACM international workshop on Web information and data management
Eliminating noisy information in Web pages for data mining
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Learning block importance models for web pages
Proceedings of the 13th international conference on World Wide Web
Automatic web news extraction using tree edit distance
Proceedings of the 13th international conference on World Wide Web
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Web data extraction based on partial tree alignment
WWW '05 Proceedings of the 14th international conference on World Wide Web
The volume and evolution of web page templates
WWW '05 Special interest tracks and posters of the 14th international conference on World Wide Web
Clustering web pages based on their structure
Data & Knowledge Engineering - Special issue: WIDM 2003
A fast and robust method for web page template detection and removal
CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
A DOM tree alignment model for mining parallel data from the web
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Designing data-intensive web applications for content accessibility using web marts
Communications of the ACM
Robust web page segmentation for mobile terminal using content-distances and page layout information
Proceedings of the 16th international conference on World Wide Web
Computing block importance for searching on web sites
Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
ISITC '07 Proceedings of the 2007 International Symposium on Information Technology Convergence
A graph-theoretic approach to webpage segmentation
Proceedings of the 17th international conference on World Wide Web
Learning from multi-topic web documents for contextual advertisement
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Extraction of Informative Blocks from Web Pages
ALPIT '08 Proceedings of the 2008 International Conference on Advanced Language Processing and Web Information Technology
A densitometric approach to web page segmentation
Proceedings of the 17th ACM conference on Information and knowledge management
On Finding Templates on Web Collections
World Wide Web
Information theoretic measures for clusterings comparison: is a correction for chance necessary?
ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Boilerplate detection using shallow text features
Proceedings of the third ACM international conference on Web search and data mining
Using structural information to improve search in Web collections
Journal of the American Society for Information Science and Technology
VisHue: web page segmentation for an improved query interface for medlineplus medical encyclopedia
DNIS'11 Proceedings of the 7th international conference on Databases in Networked Information Systems
The downside of markup: examining the harmful effects of CSS and javascript on indexing today's web
Proceedings of the 21st ACM international conference on Information and knowledge management
Hi-index | 0.00 |
Information about how to segment a Web page can be used nowadays by applications such as segment aware Web search, classification and link analysis. In this research, we propose a fully automatic method for page segmentation and evaluate its application through experiments with four separate Web sites. While the method may be used in other applications, our main focus in this article is to use it as input to segment aware Web search systems. Our results indicate that the proposed method produces better segmentation results when compared to the best segmentation method we found in literature. Further, when applied as input to a segment aware Web search method, it produces results close to those produced when using a manual page segmentation method.