Document clustering based on web search hit counts

Authors:
Masaya Kaneko;Shusuke Okamoto;Masaki Kohana;You Inayoshi
Affiliations:
Graduate School of Science and Technology, Seikei University, 3-3-1, Kichijoji-Kitamachi, Musashino-shi, Tokyo, 180-8633, Japan;Graduate School of Science and Technology, Seikei University, 3-3-1, Kichijoji-Kitamachi, Musashino-shi, Tokyo, 180-8633, Japan;Graduate School of Science and Technology, Seikei University, 3-3-1, Kichijoji-Kitamachi, Musashino-shi, Tokyo, 180-8633, Japan;Graduate School of Science and Technology, Seikei University, 3-3-1, Kichijoji-Kitamachi, Musashino-shi, Tokyo, 180-8633, Japan
Venue:
International Journal of Business Intelligence and Data Mining
Year:
2013

Citing 8
Cited 0

Web document clustering: a feasibility demonstration

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Parallel Fuzzy c-Means Clustering for Large Data Sets

Euro-Par '02 Proceedings of the 8th International Euro-Par Conference on Parallel Processing
A web-based kernel function for measuring the similarity of short text snippets

Proceedings of the 15th international conference on World Wide Web
Measuring semantic similarity between words using web search engines

Proceedings of the 16th international conference on World Wide Web
The Google Similarity Distance

IEEE Transactions on Knowledge and Data Engineering
High Performance Parallel Database Processing and Grid Databases

High Performance Parallel Database Processing and Grid Databases
Using web-search results to measure word-group similarity

COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
A Web Search Engine-Based Approach to Measure Semantic Similarity between Words

IEEE Transactions on Knowledge and Data Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper describes a web mining method for clustering research documents automatically. Web hit counts of AND-search for two words are used to form a document feature vector. Target documents are clustered using the k-means clustering method twice, in which cosine similarity is used to calculate the distance measure.