Boosting and Rocchio applied to text filtering
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Evaluating evaluation measure stability
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
A study of thresholding strategies for text categorization
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Maximum likelihood estimation for filtering thresholds
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Machine learning in automated text categorization
ACM Computing Surveys (CSUR)
Building a filtering test collection for TREC 2002
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Mining Ontology for Automatically Acquiring Web User Information Needs
IEEE Transactions on Knowledge and Data Engineering
Deploying Approaches for Pattern Refinement in Text Mining
ICDM '06 Proceedings of the Sixth International Conference on Data Mining
Introduction to Information Retrieval
Introduction to Information Retrieval
Search Engines: Information Retrieval in Practice
Search Engines: Information Retrieval in Practice
Multilabel classification with meta-level features
Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Mining positive and negative patterns for relevance feature discovery
Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
Hi-index | 0.00 |
A big challenge for classification on text is the noisy of text data. It makes classification quality low. Many classification process can be divided into two sequential steps scoring and threshold setting (thresholding). Therefore to deal with noisy data problem, it is important to describe positive feature effectively scoring and to set a suitable threshold. Most existing text classifiers do not concentrate on these two jobs. In this paper, we propose a novel text classifier with pattern-based scoring that describe positive feature effectively, followed by threshold setting. The thresholding is based on score of training set, make it is simple to implement in other scoring methods. Experiment shows that our pattern-based classifier is promising.