Query based optimal web site clustering using simulated annealing

Authors:
Wookey Lee;Young Kuk Kim;Bok Sik Yoon;Jiang Jin Xi
Affiliations:
Inha University, Incheon, Korea;Chungnam National University, Daejeon, Korea;Hongik University, Seoul, Korea;Yanbian University, Jinlin Province, China
Venue:
Proceedings of the 10th International Conference on Information Integration and Web-based Applications & Services
Year:
2008

Citing 23
Cited 0

Simple bounds on the convergence rate of an ergodic Markov chain

Information Processing Letters
Training algorithms for linear text classifiers

SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
On power-law relationships of the Internet topology

Proceedings of the conference on Applications, technologies, architectures, and protocols for computer communication
Authoritative sources in a hyperlinked environment

Journal of the ACM (JACM)
Sitemaps, storyboards, and specifications: a sketch of Web site design practice

DIS '00 Proceedings of the 3rd conference on Designing interactive systems: processes, practices, methods, and techniques
An Evaluation of Statistical Approaches to Text Categorization

Information Retrieval
TREC interactive with Chesire II

Information Processing and Management: an International Journal - Special issue on interactivity at the text retrieval conference (TREC)
Inferring hierarchical descriptions

Proceedings of the eleventh international conference on Information and knowledge management
Exploiting Hierarchy in Text Categorization

Information Retrieval
The effects of information scent on visual search in the hyperbolic tree browser

ACM Transactions on Computer-Human Interaction (TOCHI)
Web Site Optimization Using Page Popularity

IEEE Internet Computing
Improving Web Site Design

IEEE Internet Computing
Using PageRank to Characterize Web Structure

COCOON '02 Proceedings of the 8th Annual International Conference on Computing and Combinatorics
Effectively Finding Relevant Web Pages from Linkage Information

IEEE Transactions on Knowledge and Data Engineering
Topic-Sensitive PageRank: A Context-Sensitive Ranking Algorithm for Web Search

IEEE Transactions on Knowledge and Data Engineering
Ranking the web frontier

Proceedings of the 13th international conference on World Wide Web
Using the structure of Web sites for automatic segmentation of tables

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Using retrieval measures to assess similarity in mining dynamic web clickstreams

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
A system for visualizing and analyzing the evolution of the web with a time series of graphs

Proceedings of the sixteenth ACM conference on Hypertext and hypermedia
WebGuard: A Web Filtering Engine Combining Textual, Structural, and Visual Content-Based Analysis

IEEE Transactions on Knowledge and Data Engineering
Methods for domain-independent information extraction from the web: an experimental comparison

AAAI'04 Proceedings of the 19th national conference on Artifical intelligence
Maximum rooted spanning trees for the web

OTM'06 Proceedings of the 2006 international conference on On the Move to Meaningful Internet Systems: AWeSOMe, CAMS, COMINF, IS, KSinBIT, MIOS-CIAO, MONET - Volume Part II
Adaptive Hierarchical Surrogate for Searching Web with Mobile Devices

IEEE Transactions on Consumer Electronics

Quantified Score

Hi-index	0.00

Visualization

Abstract

Clustering is a viable technique to deal with the scaling issue for the web documents, which has been known for complicated combinatorial optimization problem. It is hard to develop a generally applicable optimal algorithm on the web document clustering and classification for which a simulated annealing algorithm is developed. The web document classification problem is addressed as the problem of best describing match between a web query and a hypothesized web object. The normalized term frequency and inverse document frequency coefficient is used as a measure of the match. Test beds are generated on-line during the search by transforming web sites. As a result, web sites can be clustered optimally in terms of keyword vectors of corresponding web documents.