Reasoning for web document associations and its applications in site map construction

Authors:
K. Selçuk Candan;Wen-Syan Li
Affiliations:
C & C Research Laboratories - Silicon Vally, NEC USA Inc., 10080 North Wolfe Road, Suite SW3-350, Cupertino, CA;C & C Research Laboratories - Silicon Vally, NEC USA Inc., 10080 North Wolfe Road, Suite SW3-350, Cupertino, CA
Venue:
Data & Knowledge Engineering
Year:
2002

Citing 14
Cited 11

Inferring Web communities from link topology

Proceedings of the ninth ACM conference on Hypertext and hypermedia : links, objects, time and space---structure in hypermedia systems: links, objects, time and space---structure in hypermedia systems
Improved algorithms for topic distillation in a hyperlinked environment

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Automatic resource compilation by analyzing hyperlink structure and associated text

WWW7 Proceedings of the seventh international conference on World Wide Web 7
The anatomy of a large-scale hypertextual Web search engine

WWW7 Proceedings of the seventh international conference on World Wide Web 7
Finding related pages in the World Wide Web

WWW '99 Proceedings of the eighth international conference on World Wide Web
Trawling the Web for emerging cyber-communities

WWW '99 Proceedings of the eighth international conference on World Wide Web
Mirror, mirror on the Web: a study of host pairs with replicated content

WWW '99 Proceedings of the eighth international conference on World Wide Web
Authoritative sources in a hyperlinked environment

Proceedings of the ninth annual ACM-SIAM symposium on Discrete algorithms
Defining logical domains in a web site

HYPERTEXT '00 Proceedings of the eleventh ACM on Hypertext and hypermedia
Integrating content search with structure analysis for hypermedia retrieval and management

ACM Computing Surveys (CSUR)
The stochastic approach for link-structure analysis (SALSA) and the TKC effect

Proceedings of the 9th international World Wide Web conference on Computer networks : the international journal of computer and telecommunications netowrking
What is this page known for? Computing Web page reputations

Proceedings of the 9th international World Wide Web conference on Computer networks : the international journal of computer and telecommunications netowrking
Constructing multi-granular and topic-focused web site maps

Proceedings of the 10th international conference on World Wide Web
SALSA: the stochastic approach for link-structure analysis

ACM Transactions on Information Systems (TOIS)

Unsupervised Link Discovery in Multi-relational Data via Rarity Analysis

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Topic segmentation of message hierarchies for indexing and navigation support

WWW '05 Proceedings of the 14th international conference on World Wide Web
Finding and classifying web units in websites

International Journal of Business Intelligence and Data Mining
SEA: Segment-enrich-annotate paradigm for adapting dialog-based content for improved accessibility

ACM Transactions on Information Systems (TOIS)
Leveraging structural knowledge for hierarchically-informed keyword weight propagation in the web

WebKDD'06 Proceedings of the 8th Knowledge discovery on the web international conference on Advances in web mining and web usage analysis
SBV-Cut: Vertex-cut based graph partitioning using structural balance vertices

Data & Knowledge Engineering
Automatically constructing descriptive site maps

APWeb'06 Proceedings of the 8th Asia-Pacific Web conference on Frontiers of WWW Research and Development
A rule based approach to message board topics classification

MIS'05 Proceedings of the 11th international conference on Advances in Multimedia Information Systems
Web classification of conceptual entities using co-training

Expert Systems with Applications: An International Journal
Hive open research network platform

Proceedings of the 16th International Conference on Extending Database Technology
LR-PPR: locality-sensitive, re-use promoting, approximate personalized pagerank computation

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management

Quantified Score

Hi-index	0.00

Visualization

Abstract

Recently, there is an interest in using associations between Web pages in providing users with pages relevant to what they are currently viewing. We believe that, to enable intelligent decisions, we need to answer the question "for a given set of pages, find out why they are associated". We present a framework for reasoning about Web document associations. We start from the observation that the reasons of the Web page associations are implicitly embedded in the content of the pages as well as the links connecting them. The association reasoning scheme we propose is based on a random walk algorithm. This algorithm can take both link structure and contents into consideration and allows users to specify a focus. We then show how the proposed algorithm, combined with a logical domain identification technique, can be used for Web site summarization and Web site map construction to help users navigate through complex corporate sites. We see that, to achieve this goal, it is essential to recover the Web authors' intentions and superimpose it with the users' retrieval contexts in summarizing Web sites. Therefore, we present a framework, which uses logical neighborhoods, entry pages, and associations of entry pages, in creating context-sensitive summaries and maps of complex Web sites.