Life, death, and lawfulness on the electronic frontier
Proceedings of the ACM SIGCHI Conference on Human factors in computing systems
Algorithm 457: finding all cliques of an undirected graph
Communications of the ACM
Function-based object model towards website adaptation
Proceedings of the 10th international conference on World Wide Web
DOM-based content extraction of HTML documents
WWW '03 Proceedings of the 12th international conference on World Wide Web
Utilizing hyperlink transitivity to improve web page clustering
ADC '03 Proceedings of the 14th Australasian database conference - Volume 17
Untangling compound documents on the web
Proceedings of the fourteenth ACM conference on Hypertext and hypermedia
Navigational Structure Mining for Usability Analysis
EEE '05 Proceedings of the 2005 IEEE International Conference on e-Technology, e-Commerce and e-Service (EEE'05) on e-Technology, e-Commerce and e-Service
Hierarchical topic segmentation of websites
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Robust web page segmentation for mobile terminal using content-distances and page layout information
Proceedings of the 16th international conference on World Wide Web
Detection of Web Subsites: Concepts, Algorithms, and Evaluation Issues
WI '07 Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence
Identifying a hierarchy of bipartite subgraphs for web site abstraction
Web Intelligence and Agent Systems
Web site topic-hierarchy generation based on link structure
Journal of the American Society for Information Science and Technology
Extracting content structure for web pages based on visual representation
APWeb'03 Proceedings of the 5th Asia-Pacific web conference on Web technologies and applications
ICDM'10 Proceedings of the 10th industrial conference on Advances in data mining: applications and theoretical aspects
EIDWT '11 Proceedings of the 2011 International Conference on Emerging Intelligent Data and Web Technologies
Hierarchical web-page clustering via in-page and cross-page link structures
PAKDD'10 Proceedings of the 14th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part II
Web document clustering using hyperlink structures
Computational Statistics & Data Analysis
Search result presentation: supporting post-search navigation by integration of taxonomy data
Proceedings of the 22nd international conference on World Wide Web companion
Mining taxonomies from web menus: rule-based concepts and algorithms
ICWE'13 Proceedings of the 13th international conference on Web Engineering
Hi-index | 0.00 |
The foundation of almost all web sites' information architecture is a hierarchical content organization. Thus information architects put much effort in designing taxonomies that structure the content in a comprehensible and sound way. The taxonomies are obvious to human users from the site's system of main and sub menus. But current methods of web structure mining are not able to extract these central aspects of the information architecture. This is because they cannot interpret the visual encoding to recognize menus and their rank as humans do. In this paper we show that a web site's main navigation system can not only be distinguished by visual features but also by certain structural characteristics of the HTML tree and the web graph. We have developed a reliable and scalable solution that solves the problem of extracting menus for mining the information architecture. The novel MenuMiner-algorithm allows retrieving the original content organization of large-scale web sites. These data are very valuable for many applications, e.g. the presentation of search results. In an experiment we applied the method for finding site boundaries within a large domain. The evaluation showed that the method reliably delivers menus and site boundaries where other current approaches fail.