Evaluating contents-link coupled web page clustering for web search results

Authors:
Yitong Wang;Masaru Kitsuregawa
Affiliations:
the University of Tokyo, Tokyo, Japan;the University of Tokyo, Tokyo, Japan
Venue:
Proceedings of the eleventh international conference on Information and knowledge management
Year:
2002

Citing 15
Cited 23

Algorithms for clustering data

Algorithms for clustering data
Scatter/Gather: a cluster-based approach to browsing large document collections

SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
HyPursuit: a hierarchical network search engine that exploits content-link hypertext clustering

Proceedings of the the seventh ACM conference on Hypertext
Life, death, and lawfulness on the electronic frontier

Proceedings of the ACM SIGCHI Conference on Human factors in computing systems
Inferring Web communities from link topology

Proceedings of the ninth ACM conference on Hypertext and hypermedia : links, objects, time and space---structure in hypermedia systems: links, objects, time and space---structure in hypermedia systems
Syntactic clustering of the Web

Selected papers from the sixth international conference on World Wide Web
Web document clustering: a feasibility demonstration

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
The anatomy of a large-scale hypertextual Web search engine

WWW7 Proceedings of the seventh international conference on World Wide Web 7
Grouper: a dynamic clustering interface to Web search results

WWW '99 Proceedings of the eighth international conference on World Wide Web
Finding related pages in the World Wide Web

WWW '99 Proceedings of the eighth international conference on World Wide Web
Trawling the Web for emerging cyber-communities

WWW '99 Proceedings of the eighth international conference on World Wide Web
Authoritative sources in a hyperlinked environment

Proceedings of the ninth annual ACM-SIAM symposium on Discrete algorithms
Partitioning-based clustering for Web document categorization

Decision Support Systems - Special issue on WITS '97
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Use Link-Based Clustering to Improve Web Search Results

WISE '01 Proceedings of the Second International Conference on Web Information Systems Engineering (WISE'01) Volume 1 - Volume 1

Panorama: extending digital libraries with topical crawlers

Proceedings of the 4th ACM/IEEE-CS joint conference on Digital libraries
Organizing structured web sources by query schemas: a clustering approach

Proceedings of the thirteenth ACM international conference on Information and knowledge management
Multiple sets of features for automatic genre classification of web documents

Information Processing and Management: an International Journal
Web searching on the Vivisimo search engine

Journal of the American Society for Information Science and Technology
Dynamic hybrid clustering of bioinformatics by incorporating text mining and citation analysis

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Clustering as an approach to support the automatic definition of semantic hyperlinks

Proceedings of the eighteenth conference on Hypertext and hypermedia
A comparative evaluation of different link types on enhancing document clustering

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Generating Fuzzy Equivalence Classes on RSS News Articles for Retrieving Correlated Information

ICCSA '08 Proceedings of the international conference on Computational Science and Its Applications, Part II
Hybrid clustering for validation and improvement of subject-classification schemes

Information Processing and Management: an International Journal
Multiple sets of features for automatic genre classification of web documents

Information Processing and Management: an International Journal
Detecting visually similar Web pages: Application to phishing detection

ACM Transactions on Internet Technology (TOIT)
QC4: a clustering evaluation method

PAKDD'07 Proceedings of the 11th Pacific-Asia conference on Advances in knowledge discovery and data mining
Document clustering of scientific texts using citation contexts

Information Retrieval
A spectral approach to clustering numerical vectors as nodes in a network

Pattern Recognition
Link proximity analysis: clustering websites by examining link proximity

ECDL'10 Proceedings of the 14th European conference on Research and advanced technology for digital libraries
Costco: robust content and structure constrained clustering of networked documents

CICLing'11 Proceedings of the 12th international conference on Computational linguistics and intelligent text processing - Volume Part II
Intelligent document filter for the internet

Data Mining
Hierarchical web-page clustering via in-page and cross-page link structures

PAKDD'10 Proceedings of the 14th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part II
Optimal and hierarchical clustering of large-scale hybrid networks for scientific mapping

Scientometrics
Fuzzy combinations of criteria: an application to web page representation for clustering

CICLing'12 Proceedings of the 13th international conference on Computational Linguistics and Intelligent Text Processing - Volume Part II
Search for minority information from wikipedia based on similarity of majority information

APWeb'12 Proceedings of the 14th Asia-Pacific international conference on Web Technologies and Applications
Improving Vietnamese web page clustering by combining neighbors' content and using iterative feature selection

Proceedings of the Third Symposium on Information and Communication Technology
Onomatology and content analysis of ergodic literature

Proceedings of the 3rd Narrative and Hypertext Workshop

Quantified Score

Hi-index	0.00

Visualization

Abstract

Clustering is currently one of the most crucial techniques for dealing (e.g. resources locating, information interpreting) with massive amount of heterogeneous information on the web. Unlike clustering in other fields, web page clustering separates unrelated pages and clusters related pages (to a specific topic) into semantically meaningful groups, which is useful for discrimination, summarization, organization and navigation of unstructured web pages. We have proposed a contents-link coupled clustering algorithm that clusters web pages by combining contents and link analysis. In this paper, we particularly study the effects of out-links (from the web pages), in-links (to the web page) and terms on the final clustering results as well as how to effectively combine these three parts to improve the quality of clustering results. We apply it to cluster web search results. Preliminary experiments and evaluations are conducted on various topics. As the experimental results show, the proposed clustering algorithm is effective and promising.