MenuMiner: revealing the information architecture of large web sites by analyzing maximal cliques

Authors:
Matthias Keller;Martin Nussbaumer
Affiliations:
Karlsruhe Institute of Technology (KIT), Karlsruhe, Germany;Karlsruhe Institute of Technology (KIT), Karlsruhe, Germany
Venue:
Proceedings of the 21st international conference companion on World Wide Web
Year:
2012

Citing 17
Cited 2

Life, death, and lawfulness on the electronic frontier

Proceedings of the ACM SIGCHI Conference on Human factors in computing systems
Algorithm 457: finding all cliques of an undirected graph

Communications of the ACM
Function-based object model towards website adaptation

Proceedings of the 10th international conference on World Wide Web
DOM-based content extraction of HTML documents

WWW '03 Proceedings of the 12th international conference on World Wide Web
Utilizing hyperlink transitivity to improve web page clustering

ADC '03 Proceedings of the 14th Australasian database conference - Volume 17
Untangling compound documents on the web

Proceedings of the fourteenth ACM conference on Hypertext and hypermedia
Navigational Structure Mining for Usability Analysis

EEE '05 Proceedings of the 2005 IEEE International Conference on e-Technology, e-Commerce and e-Service (EEE'05) on e-Technology, e-Commerce and e-Service
Hierarchical topic segmentation of websites

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Robust web page segmentation for mobile terminal using content-distances and page layout information

Proceedings of the 16th international conference on World Wide Web
Detection of Web Subsites: Concepts, Algorithms, and Evaluation Issues

WI '07 Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence
Identifying a hierarchy of bipartite subgraphs for web site abstraction

Web Intelligence and Agent Systems
Web site topic-hierarchy generation based on link structure

Journal of the American Society for Information Science and Technology
Extracting content structure for web pages based on visual representation

APWeb'03 Proceedings of the 5th Asia-Pacific web conference on Web technologies and applications
Web-site boundary detection

ICDM'10 Proceedings of the 10th industrial conference on Advances in data mining: applications and theoretical aspects
Beyond the Web Graph: Mining the Information Architecture of the WWW with Navigation Structure Graphs

EIDWT '11 Proceedings of the 2011 International Conference on Emerging Intelligent Data and Web Technologies
Hierarchical web-page clustering via in-page and cross-page link structures

PAKDD'10 Proceedings of the 14th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part II
Web document clustering using hyperlink structures

Computational Statistics & Data Analysis

Search result presentation: supporting post-search navigation by integration of taxonomy data

Proceedings of the 22nd international conference on World Wide Web companion
Mining taxonomies from web menus: rule-based concepts and algorithms

ICWE'13 Proceedings of the 13th international conference on Web Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

The foundation of almost all web sites' information architecture is a hierarchical content organization. Thus information architects put much effort in designing taxonomies that structure the content in a comprehensible and sound way. The taxonomies are obvious to human users from the site's system of main and sub menus. But current methods of web structure mining are not able to extract these central aspects of the information architecture. This is because they cannot interpret the visual encoding to recognize menus and their rank as humans do. In this paper we show that a web site's main navigation system can not only be distinguished by visual features but also by certain structural characteristics of the HTML tree and the web graph. We have developed a reliable and scalable solution that solves the problem of extracting menus for mining the information architecture. The novel MenuMiner-algorithm allows retrieving the original content organization of large-scale web sites. These data are very valuable for many applications, e.g. the presentation of search results. In an experiment we applied the method for finding site boundaries within a large domain. The evaluation showed that the method reliably delivers menus and site boundaries where other current approaches fail.