Information retrieval: data structures and algorithms
Information retrieval: data structures and algorithms
Training algorithms for linear text classifiers
SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Comparing representations in Chinese information retrieval
Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval
PAT-tree-based keyword extraction for Chinese information retrieval
Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval
Multilevel hypergraph partitioning: application in VLSI domain
DAC '97 Proceedings of the 34th annual Design Automation Conference
Distributional clustering of words for text classification
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
A re-examination of text categorization methods
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
An Evaluation of Statistical Approaches to Text Categorization
Information Retrieval
Machine Learning
A Comparative Study on Feature Selection in Text Categorization
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Improving linear classifier for Chinese text categorization
Information Processing and Management: an International Journal
Hi-index | 0.00 |
In this paper we use the miss-classified news in training data as a feedback to improve the classification accuracy. We isolate the miss-classified news from the news of original classes to form new subclasses, and modify Rocchio linear classifier by using the subclasses to form new prototype vectors such that there are more than one centroid to represent each class. We have two methods, error threshold and entropy threshold, to evaluate whether the isolation of miss-classified news is worthy or not. Experimental result shows that our approaches improves Rocchio's micro-level accuracy and achieves similar performance as kNN, but with less classification time. On the other hand, with the entropy of miss- classified news, we can figure out the ambiguity between classes and sketch the diagram of the relationship between classes as a suggestion to reorganize the structure of classes for news in the future.