Identifying a hierarchy of bipartite subgraphs for web site abstraction

Authors:
William K. Cheung;Yuxiang Sun
Affiliations:
(Correspd. E-mail: william@comp.hkbu.edu.hk) Department of Computer Science, Hong Kong Baptist University, Kowloon Tong, Hong Kong;Department of Computer Science, Hong Kong Baptist University, Kowloon Tong, Hong Kong
Venue:
Web Intelligence and Agent Systems
Year:
2007

Citing 17
Cited 7

Constant interaction-time scatter/gather browsing of very large document collections

SIGIR '93 Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval
GTM: the generative topographic mapping

Neural Computation
Improved algorithms for topic distillation in a hyperlinked environment

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Authoritative sources in a hyperlinked environment

Journal of the ACM (JACM)
Defining logical domains in a web site

HYPERTEXT '00 Proceedings of the eleventh ACM on Hypertext and hypermedia
Automatic personalization based on Web usage mining

Communications of the ACM
Towards adaptive Web sites: conceptual framework and case study

Artificial Intelligence - Special issue on Intelligent internet systems
Web mining research: a survey

ACM SIGKDD Explorations Newsletter
Discovering unexpected information from your competitors' web sites

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Using web structure for classifying and describing web pages

Proceedings of the 11th international conference on World Wide Web
Self-Organization and Identification of Web Communities

Computer
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Learning to Probabilistically Identify Authoritative Documents

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Web site mining: a new way to spot competitors, customers and suppliers in the world wide web

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Stochastic models for the Web graph

FOCS '00 Proceedings of the 41st Annual Symposium on Foundations of Computer Science
Web usage mining: discovery and applications of usage patterns from Web data

ACM SIGKDD Explorations Newsletter
Web intelligence (WI): what makes wisdom web?

IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence

Identifying document topics using the Wikipedia category network

Web Intelligence and Agent Systems
Wrapping VRXQuery with self-adaptive fuzzy capabilities

Web Intelligence and Agent Systems
MenuMiner: revealing the information architecture of large web sites by analyzing maximal cliques

Proceedings of the 21st international conference companion on World Wide Web
Search result presentation: supporting post-search navigation by integration of taxonomy data

Proceedings of the 22nd international conference on World Wide Web companion
Mining taxonomies from web menus: rule-based concepts and algorithms

ICWE'13 Proceedings of the 13th international conference on Web Engineering
CooL-AgentSpeak: Endowing AgentSpeak-DL agents with plan exchange and ontology services

Web Intelligence and Agent Systems
RoClust: Role discovery for graph clustering

Web Intelligence and Agent Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

The Web is transforming from a merely information dissemination platform towards a distributed knowledge-based platform for supporting complex problem solving. However, the existing Web contains a large amount of knowledge which is only tagged using layout related markups, making them hard to be discovered and used. In this paper, we purpose to model semantic-rich and self-contained knowledge units embedded in a web site as a mixture of bipartite sub-graphs and to extract the subgraphs as the web site abstraction via hyperlink structure and file hierarchy analysis. A recursive algorithm, named ReHITS, is derived which can identify bipartite sub-graphs with a hierarchical organization. Each identified sub-graph contains a set of associated authorities and hubs as its summarized semantic description. The effectiveness of the algorithm has been evaluated using three real web sites (containing ∼ 10000 web pages) with promising results. Detailed interpretation of the experimental results and qualitative comparison with other related work are also included.