Algorithms for clustering data
Algorithms for clustering data
Term-weighting approaches in automatic text retrieval
Information Processing and Management: an International Journal
Information retrieval: data structures and algorithms
Information retrieval: data structures and algorithms
Scatter/Gather: a cluster-based approach to browsing large document collections
SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
BIRCH: an efficient data clustering method for very large databases
SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Pivoted document length normalization
SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Training algorithms for linear text classifiers
SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Incremental clustering and dynamic information retrieval
STOC '97 Proceedings of the twenty-ninth annual ACM symposium on Theory of computing
Web document clustering: a feasibility demonstration
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Using a generalized instance set for automatic text categorization
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Information Retrieval
Text Categorization Using Weight Adjusted k-Nearest Neighbor Classification
PAKDD '01 Proceedings of the 5th Pacific-Asia Conference on Knowledge Discovery and Data Mining
Combining Statistical and Relational Methods for Learning in Hypertext Domains
ILP '98 Proceedings of the 8th International Workshop on Inductive Logic Programming
Iterative optimization and simplification of hierarchical clusterings
Journal of Artificial Intelligence Research
Hi-index | 0.00 |
With the development of the internet and computer, the amount of information through the internet is increasing rapidly and it is managed in document form. For this reason, the research into the method to manage for a large amount of document in an effective way is necessary. The document clustering is integrated documents to subject by classifying a set of documents through their similarity among them. Accordingly, the document clustering can be used in exploring and searching a document and it can increase accuracy of search. This paper proposes an efficient incremental clustering algorithm for a set of documents increase gradually. The incremental document clustering algorithm assigns a set of new documents to the legacy clusters which have been identified in advance. In addition, to improve the correctness of the clustering, removing the stop words can be proposed and the weight of the word can be calculated by the proposed TF × NIDF function. In this paper, the performance of the proposed method is analyzed by a series of experiments to identify their various characteristics.