A Novelty-based Clustering Method for On-line Documents

Authors:
Sophoin Khy;Yoshiharu Ishikawa;Hiroyuki Kitagawa
Affiliations:
Graduate School of Systems and Information Engineering, University of Tsukuba, Tsukuba, Japan 305-8573;Information Technology Center, Nagoya University, Chikusa-ku, Japan 464-8601;Graduate School of Systems and Information Engineering, University of Tsukuba, Tsukuba, Japan 305-8573 and Center for Computational Sciences, University of Tsukuba, Tsukuba, Japan 305-8573
Venue:
World Wide Web
Year:
2008

Citing 0
Cited 11

A Query Language and Its Processing for Time-Series Document Clusters

ICADL 08 Proceedings of the 11th International Conference on Asian Digital Libraries: Universal and Ubiquitous Access to Information
Clustering of document collection - A weighting approach

Expert Systems with Applications: An International Journal
On Finding Templates on Web Collections

World Wide Web
An online document clustering technique for short web contents

Pattern Recognition Letters
The Effectiveness of Latent Semantic Analysis for Building Up a Bottom-up Taxonomy from Folksonomy Tags

World Wide Web
Finding Related Search Engine Queries by Web Community Based Query Enrichment

World Wide Web
Mining discriminative items in multiple data streams

World Wide Web
Mining spatio-temporal information on microblogging streams using a density-based online clustering method

Expert Systems with Applications: An International Journal
Early detection of buzzwords based on large-scale time-series analysis of blog entries

Proceedings of the 23rd ACM conference on Hypertext and social media
T-Scroll: visualizing trends in a time-series of documents for interactive user exploration

ECDL'07 Proceedings of the 11th European conference on Research and Advanced Technology for Digital Libraries
Clustering web documents using hierarchical representation with multi-granularity

World Wide Web

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we describe a document clustering method called novelty-based document clustering. This method clusters documents based on similarity and novelty. The method assigns higher weights to recent documents than old ones and generates clusters with the focus on recent topics. The similarity function is derived probabilistically, extending the conventional cosine measure of the vector space model by incorporating a document forgetting model to produce novelty-based clusters. The clustering procedure is a variation of the K-means method. An additional feature of our clustering method is an incremental update facility, which is applied when new documents are incorporated into a document repository. Performance of the clustering method is examined through experiments. Experimental results show the efficiency and effectiveness of our method.