An incremental document clustering for the large document database

Authors:
Kil Hong Joo;Won Suk Lee
Affiliations:
Dept. of Computer Education, Gyeongin National University of Education, Gyeyang-gu, Incheon, Korea;Dept. of Computer Science, Yonsei University, Seoul, Korea
Venue:
AIRS'05 Proceedings of the Second Asia conference on Asia Information Retrieval Technology
Year:
2005

Citing 15
Cited 0

Algorithms for clustering data

Algorithms for clustering data
Term-weighting approaches in automatic text retrieval

Information Processing and Management: an International Journal
Information retrieval: data structures and algorithms

Information retrieval: data structures and algorithms
Scatter/Gather: a cluster-based approach to browsing large document collections

SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
Expert network: effective and efficient learning from human decisions in text categorization and retrieval

SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
BIRCH: an efficient data clustering method for very large databases

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Pivoted document length normalization

SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Training algorithms for linear text classifiers

SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Incremental clustering and dynamic information retrieval

STOC '97 Proceedings of the twenty-ninth annual ACM symposium on Theory of computing
Web document clustering: a feasibility demonstration

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Using a generalized instance set for automatic text categorization

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Information Retrieval

Information Retrieval
Text Categorization Using Weight Adjusted k-Nearest Neighbor Classification

PAKDD '01 Proceedings of the 5th Pacific-Asia Conference on Knowledge Discovery and Data Mining
Combining Statistical and Relational Methods for Learning in Hypertext Domains

ILP '98 Proceedings of the 8th International Workshop on Inductive Logic Programming
Iterative optimization and simplification of hierarchical clusterings

Journal of Artificial Intelligence Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

With the development of the internet and computer, the amount of information through the internet is increasing rapidly and it is managed in document form. For this reason, the research into the method to manage for a large amount of document in an effective way is necessary. The document clustering is integrated documents to subject by classifying a set of documents through their similarity among them. Accordingly, the document clustering can be used in exploring and searching a document and it can increase accuracy of search. This paper proposes an efficient incremental clustering algorithm for a set of documents increase gradually. The incremental document clustering algorithm assigns a set of new documents to the legacy clusters which have been identified in advance. In addition, to improve the correctness of the clustering, removing the stop words can be proposed and the weight of the word can be calculated by the proposed TF × NIDF function. In this paper, the performance of the proposed method is analyzed by a series of experiments to identify their various characteristics.