The anatomy of a large-scale hypertextual Web search engine
WWW7 Proceedings of the seventh international conference on World Wide Web 7
An Evaluation of Statistical Approaches to Text Categorization
Information Retrieval
Function-based object model towards website adaptation
Proceedings of the 10th international conference on World Wide Web
Template detection via data mining and its applications
Proceedings of the 11th international conference on World Wide Web
Discovering informative content blocks from Web documents
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Improving pseudo-relevance feedback in web information retrieval using web page segmentation
WWW '03 Proceedings of the 12th international conference on World Wide Web
DOM-based content extraction of HTML documents
WWW '03 Proceedings of the 12th international conference on World Wide Web
ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
Eliminating noisy information in Web pages for data mining
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Automatic browsing of large pictures on mobile devices
MULTIMEDIA '03 Proceedings of the eleventh ACM international conference on Multimedia
Using link analysis to improve layout on mobile devices
Proceedings of the 13th international conference on World Wide Web
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Solving multiclass learning problems via error-correcting output codes
Journal of Artificial Intelligence Research
Web page cleaning for web mining through feature weighting
IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
Towards user-friendly mobile browsing
AAA-IDEA '06 Proceedings of the 2nd international workshop on Advanced architectures and algorithms for internet delivery and applications
iRobot: an intelligent crawler for web forums
Proceedings of the 17th international conference on World Wide Web
Test collection management and labeling system
Proceedings of the 9th ACM symposium on Document engineering
Deriving image-text document surrogates to optimize cognition
Proceedings of the 9th ACM symposium on Document engineering
Finding and using the content texts of HTML pages
AIRS'08 Proceedings of the 4th Asia information retrieval conference on Information retrieval technology
Layout object model for extracting the schema of web query interfaces
APWeb'11 Proceedings of the 13th Asia-Pacific web conference on Web technologies and applications
Improving semantic consistency of web sites by quantifying user intent
ICWE'05 Proceedings of the 5th international conference on Web Engineering
A model-driven methodology to the content layout problem in web applications
ACM Transactions on the Web (TWEB)
Hi-index | 0.00 |
Previous work shows that a web page can be partitioned into multiple segments or blocks, and often the importance of those blocks in a page is not equivalent. It has also been proven that differentiating noisy and unimportant blocks from pages can facilitate web mining, search and accessibility. However, no uniform approach and model has been presented to measure the importance of different blocks in a web page. Through a user study, we found that people do have a consistent view about the importance of blocks in a web page. Thus, we investigate how to find a model to automatically assign importance values to blocks in a web page. We formulate the block importance estimation as a learning problem. First, we use a vision-based page segmentation technique to partition a web page into semantic blocks with a hierarchical structure. Then spatial features (such as position and size) and content features (such as the number of images and links) are extracted to construct a feature vector for each block. Then, learning algorithms are used to train a model to assign importance to each block in the web page. In our experiments, the best model can achieve the performance with Micro-F1 80.2% and Micro-Accuracy 86.8%.