An On-Line Document Clustering Method Based on Forgetting Factors

Authors:
Yoshiharu Ishikawa;Yibing Chen;Hiroyuki Kitagawa
Affiliations:
-;-;-
Venue:
ECDL '01 Proceedings of the 5th European Conference on Research and Advanced Technology for Digital Libraries
Year:
2001

Citing 8
Cited 4

Information retrieval: data structures and algorithms

Information retrieval: data structures and algorithms
Incremental clustering for dynamic information processing

ACM Transactions on Information Systems (TOIS)
Constant interaction-time scatter/gather browsing of very large document collections

SIGIR '93 Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval
Data clustering: a review

ACM Computing Surveys (CSUR)
Information Retrieval

Information Retrieval
Modern Information Retrieval

Modern Information Retrieval
Introduction to Modern Information Retrieval

Introduction to Modern Information Retrieval
Learning Approaches for Detecting and Tracking News Events

IEEE Intelligent Systems

Giving temporal order to news corpus

CIS'04 Proceedings of the First international conference on Computational and Information Science
Mining spatio-temporal information on microblogging streams using a density-based online clustering method

Expert Systems with Applications: An International Journal
Unsupervised and supervised learning to evaluate event relatedness based on content mining from social-media streams

Expert Systems with Applications: An International Journal
T-Scroll: visualizing trends in a time-series of documents for interactive user exploration

ECDL'07 Proceedings of the 11th European conference on Research and Advanced Technology for Digital Libraries

Quantified Score

Hi-index	0.00

Visualization

Abstract

With the rapid development of on-line information services, information technologies for on-line information processing have been receiving much attention recently. Clustering plays important roles in various on-line applications such as extraction of useful information from news feeding services and selection of relevant documents from the incoming scientific articles in digital libraries. In on-line environments, users generally have interests on newer documents than older ones and have no interests on obsolete old documents. Based on this observation, we propose an on-line document clustering method F2ICM (Forgetting-Factor-based Incremental Clustering Method) that incorporates the notion of a forgetting factor to calculate document similarities. The idea is that every document gradually losses its weight (or memory) as time passes according to this factor. Since F2ICM generates clusters using a document similarity measure based on the forgetting factor, newer documents have much effects on the resulting cluster structure than older ones. In this paper, we present the fundamental idea of the F2ICM method and describe its details such as the similarity measure and the clustering algorithm. Also, we show an efficient incremental statistics maintenance method of F2ICM which is indispensable for on-line dynamic environments.