Improving pseudo-relevance feedback in web information retrieval using web page segmentation
WWW '03 Proceedings of the 12th international conference on World Wide Web
Eliminating noisy information in Web pages for data mining
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Mining data records in Web pages
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Automatic web news extraction using tree edit distance
Proceedings of the 13th international conference on World Wide Web
The volume and evolution of web page templates
WWW '05 Special interest tracks and posters of the 14th international conference on World Wide Web
A fast and robust method for web page template detection and removal
CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Layout Based Information Extraction from HTML Documents
ICDAR '07 Proceedings of the Ninth International Conference on Document Analysis and Recognition - Volume 02
On Finding Templates on Web Collections
World Wide Web
A fast and simple method for extracting relevant content from news webpages
Proceedings of the 18th ACM conference on Information and knowledge management
Information extraction for search engines using fast heuristic techniques
Data & Knowledge Engineering
Bridging the gap: from multi document Template Detection to single document Content Extraction
EuroIMSA '08 Proceedings of the IASTED International Conference on Internet and Multimedia Systems and Applications
ViDE: A Vision-Based Approach for Deep Web Data Extraction
IEEE Transactions on Knowledge and Data Engineering
Hi-index | 0.00 |
Segmenting a web page may be one of initial steps of information retrieval or content classification performed on that page. While there has been an extensive research in this area, the approaches usually focus either on performance or quality of the results. Vision based segmentation is one of the quality focused methods, which are considerably slow. This paper proposes an approach for boosting the performance of vision based algorithms. Our approach is based on concepts of modern web and a very common scenario in which an entire web site is processed at once. In this scenario, a great amount of performance boost can be gained by isomorphic mapping of previous results gathered from pages within the site to other pages on the same site. We provide the results of experiments performed on VIPS, the most common algorithm for page segmentation.