Term-weighting approaches in automatic text retrieval
Information Processing and Management: an International Journal
A vector space model for automatic indexing
Communications of the ACM
Proceedings of the 2002 ACM symposium on Applied computing
k-nearest Neighbor Classification on Spatial Data Streams Using P-trees
PAKDD '02 Proceedings of the 6th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
Text classification using string kernels
The Journal of Machine Learning Research
A vertical distance-based outlier detection method with local pruning
Proceedings of the thirteenth ACM international conference on Information and knowledge management
Machine learning for Arabic text categorization: Research Articles
Journal of the American Society for Information Science and Technology
Parameter optimized, vertical, nearest-neighbor-vote and boundary-based classification
ACM SIGKDD Explorations Newsletter
Predicate-tree based pretty good privacy of data
CMS'12 Proceedings of the 13th IFIP TC 6/TC 11 international conference on Communications and Multimedia Security
Hi-index | 0.00 |
The importance of text mining stems from the availability of huge volumes of text databases holding a wealth of valuable information that needs to be mined. Text categorization is the process of assigning categories or labels to documents based entirely on their contents. Formally, it can be viewed as a mapping from the document space into a set of predefined class labels (aka subjects or categories); F: D← {C1, C2...Cn} where F is the mapping function, D is the document space and {C1, C2...Cn} is the set of class labels. Given an unlabeled document d, we need to find its class label, Ci, using the mapping function F where F(d) = Ci. In this paper, an optimized k-Nearest Neighbors (KNN) classifier that uses intervalization and the P-tree1 technology to achieve a high degree of accuracy, space utilization and time efficiency is proposed: As new samples arrive, the classifier finds the k nearest neighbors to the new sample from the training space without a single database scan.