An optimized approach for KNN text categorization using P-trees

Authors:
Imad Rahal;William Perrizo
Affiliations:
North Dakota State University, Fargo, ND;North Dakota State University, Fargo, ND
Venue:
Proceedings of the 2004 ACM symposium on Applied computing
Year:
2004

Citing 5
Cited 4

Term-weighting approaches in automatic text retrieval

Information Processing and Management: an International Journal
A vector space model for automatic indexing

Communications of the ACM
The P-tree algebra

Proceedings of the 2002 ACM symposium on Applied computing
k-nearest Neighbor Classification on Spatial Data Streams Using P-trees

PAKDD '02 Proceedings of the 6th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
Text classification using string kernels

The Journal of Machine Learning Research

A vertical distance-based outlier detection method with local pruning

Proceedings of the thirteenth ACM international conference on Information and knowledge management
Machine learning for Arabic text categorization: Research Articles

Journal of the American Society for Information Science and Technology
Parameter optimized, vertical, nearest-neighbor-vote and boundary-based classification

ACM SIGKDD Explorations Newsletter
Predicate-tree based pretty good privacy of data

CMS'12 Proceedings of the 13th IFIP TC 6/TC 11 international conference on Communications and Multimedia Security

Quantified Score

Hi-index	0.00

Visualization

Abstract

The importance of text mining stems from the availability of huge volumes of text databases holding a wealth of valuable information that needs to be mined. Text categorization is the process of assigning categories or labels to documents based entirely on their contents. Formally, it can be viewed as a mapping from the document space into a set of predefined class labels (aka subjects or categories); F: D← {C1, C2...Cn} where F is the mapping function, D is the document space and {C1, C2...Cn} is the set of class labels. Given an unlabeled document d, we need to find its class label, Ci, using the mapping function F where F(d) = Ci. In this paper, an optimized k-Nearest Neighbors (KNN) classifier that uses intervalization and the P-tree1 technology to achieve a high degree of accuracy, space utilization and time efficiency is proposed: As new samples arrive, the classifier finds the k nearest neighbors to the new sample from the training space without a single database scan.