Recognition of Common Areas in a Web Page Using Visual Information: a possible application in a page classification

Authors:
Milos Kovacevic;Michelangelo Diligenti;Marco Gori;Veljko Milutinovic
Affiliations:
-;-;-;-
Venue:
ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
Year:
2002

Citing 0
Cited 31

Learning block importance models for web pages

Proceedings of the 13th international conference on World Wide Web
Properties-based retrieval and user decision states: user control and behavior modeling

Journal of the American Society for Information Science and Technology
Learning important models for web page blocks based on layout and content analysis

ACM SIGKDD Explorations Newsletter
Fully automatic wrapper generation for search engines

WWW '05 Proceedings of the 14th international conference on World Wide Web
The portrait of a common HTML web page

Proceedings of the 2006 ACM symposium on Document engineering
Vertical Navigation of Layout Adapted Web Documents

World Wide Web
Homepage live: automatic block tracing for web personalization

Proceedings of the 16th international conference on World Wide Web
Towards domain-independent information extraction from web tables

Proceedings of the 16th international conference on World Wide Web
Visual features in genre classification of html

Proceedings of the eighteenth conference on Hypertext and hypermedia
Towards a unified approach to document similarity search using manifold-ranking of blocks

Information Processing and Management: an International Journal
Enhancing web page classification through image-block importance analysis

Information Processing and Management: an International Journal
Efficient web browsing on small screens

AVI '08 Proceedings of the working conference on Advanced visual interfaces
Improving web information indexing and retrieval based on center block duplication detection

International Journal of Innovative Computing and Applications
Web Contents Extracting for Web-Based Learning

ICWL '08 Proceedings of the 7th international conference on Advances in Web Based Learning
Granular modeling of web documents: impact on information retrieval systems

Proceedings of the 10th ACM workshop on Web information and data management
An Analysis of Visual and Presentation Factors Influencing the Design of E-commerce Web Sites

WI-IAT '08 Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Volume 03
Webpage understanding: beyond page-level search

ACM SIGMOD Record
Template-independent news extraction based on visual consistency

AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 2
Enhanced Gestalt Theory Guided Web Page Segmentation for Mobile Browsing

WI-IAT '09 Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology - Volume 03
Using visual pages analysis for optimizing web archiving

Proceedings of the 2010 EDBT/ICDT Workshops
Automatic document structure detection for data integration

BIS'07 Proceedings of the 10th international conference on Business information systems
Visual structure-based web page clustering and retrieval

Proceedings of the 19th international conference on World wide web
Vi-DIFF: understanding web pages changes

DEXA'10 Proceedings of the 21st international conference on Database and expert systems applications: Part I
Block-based similarity search on the web using manifold-ranking

WISE'06 Proceedings of the 7th international conference on Web Information Systems
Design and implementation of web usage mining system using page scroll

ICCSA'06 Proceedings of the 2006 international conference on Computational Science and Its Applications - Volume Part V
User-centric adaptation of Web information for small screens

Journal of Visual Languages and Computing
Fast algorithm for assessing semantic similarity of texts

International Journal of Intelligent Information and Database Systems
Effectiveness of template detection on noise reduction and websites summarization

Information Sciences: an International Journal
A hybrid approach for extracting informative content from web pages

Information Processing and Management: an International Journal
Heuristic role detection of visual elements of web pages

ICWE'13 Proceedings of the 13th international conference on Web Engineering
CALA: An unsupervised URL-based web page classification system

Knowledge-Based Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Extracting and processing information from Webpages is an important task in many areas likeconstructing search engines, information retrieval, anddata mining from the Web. Common approach in theextraction process is to represent a page as a "bag ofwords" and then to perform additional processing onsuch a flat representation. In this paper we propose anew, hierarchical representation that includes browserscreen coordinates for every HTML object in a page.Using visual information one is able to define heuristicsfor the recognition of common page areas such asheader, left and right menu, footer and center of a page.We show in initial experiments that using our heuristicsdefined objects are recognized properly in 73% of cases.Finally, we show that a Naive Bayes classifier, takinginto account the proposed representation, clearlyoutperforms the same classifier using only informationabout the content of documents.