Learning block importance models for web pages
Proceedings of the 13th international conference on World Wide Web
Properties-based retrieval and user decision states: user control and behavior modeling
Journal of the American Society for Information Science and Technology
Learning important models for web page blocks based on layout and content analysis
ACM SIGKDD Explorations Newsletter
Fully automatic wrapper generation for search engines
WWW '05 Proceedings of the 14th international conference on World Wide Web
The portrait of a common HTML web page
Proceedings of the 2006 ACM symposium on Document engineering
Vertical Navigation of Layout Adapted Web Documents
World Wide Web
Homepage live: automatic block tracing for web personalization
Proceedings of the 16th international conference on World Wide Web
Towards domain-independent information extraction from web tables
Proceedings of the 16th international conference on World Wide Web
Visual features in genre classification of html
Proceedings of the eighteenth conference on Hypertext and hypermedia
Towards a unified approach to document similarity search using manifold-ranking of blocks
Information Processing and Management: an International Journal
Enhancing web page classification through image-block importance analysis
Information Processing and Management: an International Journal
Efficient web browsing on small screens
AVI '08 Proceedings of the working conference on Advanced visual interfaces
Improving web information indexing and retrieval based on center block duplication detection
International Journal of Innovative Computing and Applications
Web Contents Extracting for Web-Based Learning
ICWL '08 Proceedings of the 7th international conference on Advances in Web Based Learning
Granular modeling of web documents: impact on information retrieval systems
Proceedings of the 10th ACM workshop on Web information and data management
An Analysis of Visual and Presentation Factors Influencing the Design of E-commerce Web Sites
WI-IAT '08 Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Volume 03
Webpage understanding: beyond page-level search
ACM SIGMOD Record
Template-independent news extraction based on visual consistency
AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 2
Enhanced Gestalt Theory Guided Web Page Segmentation for Mobile Browsing
WI-IAT '09 Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology - Volume 03
Using visual pages analysis for optimizing web archiving
Proceedings of the 2010 EDBT/ICDT Workshops
Automatic document structure detection for data integration
BIS'07 Proceedings of the 10th international conference on Business information systems
Visual structure-based web page clustering and retrieval
Proceedings of the 19th international conference on World wide web
Vi-DIFF: understanding web pages changes
DEXA'10 Proceedings of the 21st international conference on Database and expert systems applications: Part I
Block-based similarity search on the web using manifold-ranking
WISE'06 Proceedings of the 7th international conference on Web Information Systems
Design and implementation of web usage mining system using page scroll
ICCSA'06 Proceedings of the 2006 international conference on Computational Science and Its Applications - Volume Part V
User-centric adaptation of Web information for small screens
Journal of Visual Languages and Computing
Fast algorithm for assessing semantic similarity of texts
International Journal of Intelligent Information and Database Systems
Effectiveness of template detection on noise reduction and websites summarization
Information Sciences: an International Journal
A hybrid approach for extracting informative content from web pages
Information Processing and Management: an International Journal
Heuristic role detection of visual elements of web pages
ICWE'13 Proceedings of the 13th international conference on Web Engineering
CALA: An unsupervised URL-based web page classification system
Knowledge-Based Systems
Hi-index | 0.00 |
Extracting and processing information from Webpages is an important task in many areas likeconstructing search engines, information retrieval, anddata mining from the Web. Common approach in theextraction process is to represent a page as a "bag ofwords" and then to perform additional processing onsuch a flat representation. In this paper we propose anew, hierarchical representation that includes browserscreen coordinates for every HTML object in a page.Using visual information one is able to define heuristicsfor the recognition of common page areas such asheader, left and right menu, footer and center of a page.We show in initial experiments that using our heuristicsdefined objects are recognized properly in 73% of cases.Finally, we show that a Naive Bayes classifier, takinginto account the proposed representation, clearlyoutperforms the same classifier using only informationabout the content of documents.