A vector space model for automatic indexing
Communications of the ACM
Pattern Classification (2nd Edition)
Pattern Classification (2nd Edition)
Learning block importance models for web pages
Proceedings of the 13th international conference on World Wide Web
Combining DOM tree and geometric layout analysis for online medical journal article segmentation
Proceedings of the 6th ACM/IEEE-CS joint conference on Digital libraries
BlockWeb: An IR Model for Block Structured Web Pages
CBMI '09 Proceedings of the 2009 Seventh International Workshop on Content-Based Multimedia Indexing
Extracting content structure for web pages based on visual representation
APWeb'03 Proceedings of the 5th Asia-Pacific web conference on Web technologies and applications
Hi-index | 0.00 |
We present in this paper a model that we have developed for indexing and querying web pages based on their visual rendering. In this model pages are split up into a set of visual blocks. The indexing of a block takes into account its content, its visual importance and, by permeability, the indexing of neighbors blocks. A page is modeled as a directed acyclic graph. Each node is associated with a block and labeled by the coefficient of importance of this block. Each edge is labeled by the coefficient of permeability of the target node content to the source node content. Importance and permeability coefficients cannot be manually quantified. the second part of this paper, we present an experiment consisting in learning optimal permeability coefficients by gradient descent for indexing images of a web page from the text blocks of this page. The dataset is drawn from real web pages of the train and test set of the ImagEval task2 corpus. Results demonstrate an improvement of the indexing using non uniform block permeabilities.