An evaluation of phrasal and clustered representations on a text categorization task
SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
OHSUMED: an interactive retrieval evaluation and new large test collection for research
SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Training algorithms for linear text classifiers
SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Context-sensitive learning methods for text categorization
SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Boosting and Rocchio applied to text filtering
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
A re-examination of text categorization methods
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
A probabilistic description-oriented approach for categorizing web documents
Proceedings of the eighth international conference on Information and knowledge management
Improving text categorization methods for event tracking
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
An Evaluation of Statistical Approaches to Text Categorization
Information Retrieval
Information Retrieval
A Study of Approaches to Hypertext Categorization
Journal of Intelligent Information Systems
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features
ECML '98 Proceedings of the 10th European Conference on Machine Learning
Hypertext Categorization using Hyperlink Patterns and Meta Data
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Bayesian online classifiers for text classification and filtering
SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Study of category score algorithms for k-NN classifier
SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Web classification using support vector machine
Proceedings of the 4th international workshop on Web information and data management
Information Filtering in TREC-9 and TDT-3: A Comparative Analysis
Information Retrieval
A Study of Approaches to Hypertext Categorization
Journal of Intelligent Information Systems
PRICAI '02 Proceedings of the 7th Pacific Rim International Conference on Artificial Intelligence: Trends in Artificial Intelligence
Web unit mining: finding and classifying subgraphs of web pages
CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
Text classification from positive and unlabeled documents
CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
Boosting support vector machines for text classification through parameter-free threshold relaxation
CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
Index construction for linear categorisation
CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
Category cluster discovery from distributed WWW directories
Information Sciences—Informatics and Computer Science: An International Journal - special issue: Knowledge discovery from distributed information sources
Liveclassifier: creating hierarchical text classifiers through web corpora
Proceedings of the 13th international conference on World Wide Web
RCV1: A New Benchmark Collection for Text Categorization Research
The Journal of Machine Learning Research
Using bayesian priors to combine classifiers for adaptive filtering
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Using a web-based categorization approach to generate thematic metadata from texts
ACM Transactions on Asian Language Information Processing (TALIP)
Text Classification without Labeled Negative Documents
ICDE '05 Proceedings of the 21st International Conference on Data Engineering
An analysis of the relative hardness of Reuters-21578 subsets: Research Articles
Journal of the American Society for Information Science and Technology
Boosting SVM classifiers by ensemble
WWW '05 Special interest tracks and posters of the 14th international conference on World Wide Web
An experimental study on large-scale web categorization
WWW '05 Special interest tracks and posters of the 14th international conference on World Wide Web
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Adaptive sampling for thresholding in document filtering and classification
Information Processing and Management: an International Journal
Parameter free bursty events detection in text streams
VLDB '05 Proceedings of the 31st international conference on Very large data bases
On Combining Classifier Mass Functions for Text Categorization
IEEE Transactions on Knowledge and Data Engineering
Support vector machines classification with a very large-scale taxonomy
ACM SIGKDD Explorations Newsletter - Natural language processing and text mining
A novel refinement approach for text categorization
Proceedings of the 14th ACM international conference on Information and knowledge management
A support vector method for multivariate performance measures
ICML '05 Proceedings of the 22nd international conference on Machine learning
Text Classification without Negative Examples Revisit
IEEE Transactions on Knowledge and Data Engineering
Efficient Text Classification by Weighted Proximal SVM
ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Automatic detection of group functional roles in face to face interactions
Proceedings of the 8th international conference on Multimodal interfaces
Clustering e-commerce search engines based on their search interface pages using WISE-cluster
Data & Knowledge Engineering - Special issue: WIDM 2004
Contextual feature selection for text classification
Information Processing and Management: an International Journal - Special issue: AIRS2005: Information retrieval research in Asia
Answering bounded continuous search queries in the world wide web
Proceedings of the 16th international conference on World Wide Web
Using hypothesis margin to boost centroid text classifier
Proceedings of the 2007 ACM symposium on Applied computing
Dynamic category profiling for text filtering and classification
Information Processing and Management: an International Journal
Discriminative feature selection via multiclass variable memory Markov model
EURASIP Journal on Applied Signal Processing
Personalised online sales using web usage data mining
Computers in Industry
An empirical study of sentiment analysis for chinese documents
Expert Systems with Applications: An International Journal
Interactive high-quality text classification
Information Processing and Management: an International Journal
Finding and classifying web units in websites
International Journal of Business Intelligence and Data Mining
Using unlabeled data to handle domain-transfer problem of semantic detection
Proceedings of the 2008 ACM symposium on Applied computing
Automated Classification and Categorization of Mathematical Knowledge
Proceedings of the 9th AISC international conference, the 15th Calculemas symposium, and the 7th international MKM conference on Intelligent Computer Mathematics
Multi-value Classification of Very Short Texts
KI '08 Proceedings of the 31st annual German conference on Advances in Artificial Intelligence
Incorporating topical support documents into a small training set in text categorization
Proceedings of the 17th ACM conference on Information and knowledge management
Adapting svm for data sparseness and imbalance: A case study in information extraction
Natural Language Engineering
Effects of Term Distributions on Binary Classification
IEICE - Transactions on Information and Systems
Improving Automatic Text Classification by Integrated Feature Analysis
IEICE - Transactions on Information and Systems
Large scale multi-label classification via metalabeler
Proceedings of the 18th international conference on World wide web
Threshold selection for web-page classification with highly skewed class distribution
Proceedings of the 18th international conference on World wide web
Semi-structured document categorization with a semantic kernel
Pattern Recognition
Effective multi-label active learning for text classification
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Wikipedia-based semantic interpretation for natural language processing
Journal of Artificial Intelligence Research
Locating case discussion segments in recorded medical team meetings
SSCS '09 Proceedings of the third workshop on Searching spontaneous conversational speech
On strategies for imbalanced text classification using SVM: A comparative study
Decision Support Systems
Automatic content-based categorization of Wikipedia articles
People's Web '09 Proceedings of the 2009 Workshop on The People's Web Meets NLP: Collaboratively Constructed Semantic Resources
Computing with words for text processing: An approach to the text categorization
Information Sciences: an International Journal
Entropy-based authorship search in large document collections
ECIR'07 Proceedings of the 29th European conference on IR research
Semantic-based grouping of search engine results using WordNet
APWeb/WAIM'07 Proceedings of the joint 9th Asia-Pacific web and 8th international conference on web-age information management conference on Advances in data and web management
Text classification for healthcare information support
IEA/AIE'07 Proceedings of the 20th international conference on Industrial, engineering, and other applications of applied intelligent systems
Optimization of bounded continuous search queries based on ranking distributions
WISE'07 Proceedings of the 8th international conference on Web information systems engineering
Conditional mutual information based feature selection for classification task
CIARP'07 Proceedings of the Congress on pattern recognition 12th Iberoamerican conference on Progress in pattern recognition, image analysis and applications
An intelligent agent-based system for multilingual financial news digest
KES-AMSTA'08 Proceedings of the 2nd KES International conference on Agent and multi-agent systems: technologies and applications
Cascaded feature selection in SVMs text categorization
CICLing'03 Proceedings of the 4th international conference on Computational linguistics and intelligent text processing
Multilabel classification with meta-level features
Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
An intelligent agent-based system for multilingual financial news digest
International Journal of Intelligent Information and Database Systems
CiteData: a new multi-faceted dataset for evaluating personalized search performance
CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Link-based text classification using Bayesian networks
INEX'09 Proceedings of the Focused retrieval and evaluation, and 8th international conference on Initiative for the evaluation of XML retrieval
UJM at INEX 2009 XML mining track
INEX'09 Proceedings of the Focused retrieval and evaluation, and 8th international conference on Initiative for the evaluation of XML retrieval
Using chi-square statistics to measure similarities for text categorization
Expert Systems with Applications: An International Journal
Modelling probabilistic inference networks and classification in probabilistic datalog
SUM'10 Proceedings of the 4th international conference on Scalable uncertainty management
An intraday market risk management approach based on textual analysis
Decision Support Systems
ECIR'11 Proceedings of the 33rd European conference on Advances in information retrieval
An effective feature selection method for text categorization
PAKDD'11 Proceedings of the 15th Pacific-Asia conference on Advances in knowledge discovery and data mining - Volume Part I
Classifier selection approaches for multi-label problems
MCS'11 Proceedings of the 10th international conference on Multiple classifier systems
A comparative study of thresholding strategies in progressive filtering
AI*IA'11 Proceedings of the 12th international conference on Artificial intelligence around man and beyond
A classification approach with a reject option for multi-label problems
ICIAP'11 Proceedings of the 16th international conference on Image analysis and processing: Part I
A new nearest neighbor rule for text categorization
CIARP'06 Proceedings of the 11th Iberoamerican conference on Progress in Pattern Recognition, Image Analysis and Applications
Selection strategies for multi-label text categorization
FinTAL'06 Proceedings of the 5th international conference on Advances in Natural Language Processing
Dynamic category profiling for text filtering and classification
PAKDD'06 Proceedings of the 10th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
Filtering contents with bigrams and named entities to improve text classification
AIRS'05 Proceedings of the Second Asia conference on Asia Information Retrieval Technology
A term weighting approach for text categorization
AIRS'05 Proceedings of the Second Asia conference on Asia Information Retrieval Technology
Categorizing unknown text segments for information extraction using a search result mining approach
IJCNLP'04 Proceedings of the First international joint conference on Natural Language Processing
Application of text categorization to astronomy field
NLDB'05 Proceedings of the 10th international conference on Natural Language Processing and Information Systems
A Non-VSM kNN algorithm for text classification
ADMA'05 Proceedings of the First international conference on Advanced Data Mining and Applications
PERC: a personal email classifier
ECIR'06 Proceedings of the 28th European conference on Advances in Information Retrieval
Exploiting concept clumping for efficient incremental news article categorization
ADMA'11 Proceedings of the 7th international conference on Advanced Data Mining and Applications - Volume Part I
Semi-automatic document classification: exploiting document difficulty
ECIR'12 Proceedings of the 34th European conference on Advances in Information Retrieval
The nonverbal structure of patient case discussions in multidisciplinary medical team meetings
ACM Transactions on Information Systems (TOIS)
MCut: a thresholding strategy for multi-label classification
IDA'12 Proceedings of the 11th international conference on Advances in Intelligent Data Analysis
An approach to improving quality of crawlers using Naïve bayes for classifier and hyperlink filter
ICCCI'12 Proceedings of the 4th international conference on Computational Collective Intelligence: technologies and applications - Volume Part I
Threshold optimisation for multi-label classifiers
Pattern Recognition
Scoring-Thresholding pattern based text classifier
ACIIDS'13 Proceedings of the 5th Asian conference on Intelligent Information and Database Systems - Volume Part I
Iterative classification for multiple target attributes
Journal of Intelligent Information Systems
Multi-label classification with a reject option
Pattern Recognition
Recursive regularization for large-scale classification with hierarchical and graphical dependencies
Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
A pattern based two-stage text classifier
MLDM'13 Proceedings of the 9th international conference on Machine Learning and Data Mining in Pattern Recognition
GSVM: An SVM for handling imbalanced accuracy between classes inbi-classification problems
Applied Soft Computing
Hi-index | 0.00 |
Thresholding strategies in automated text categorization are an underexplored area of research. This paper presents an examination of the effect of thresholding strategies on the performance of a classifier under various conditions. Using k-Nearest Neighbor (kNN) as the classifier and five evaluation benchmark collections as the testbets, three common thresholding methods were investigated, including rank-based thresholding (RCut), proportion-based assignments (PCut) and score-based local optimization (SCut); in addition, new variants of these methods are proposed to overcome significant problems in the existing approaches. Experimental results show that the choice of thresholding strategy can significantly influence the performance of kNN, and that the ``optimal'' strategy may vary by application. SCut is potentially better for fine-tuning but risks overfitting. PCut copes better with rare categories and exhibits a smoother trade-off in recall versus precision, but is not suitable for online decision making. RCut is most natural for online response but is too coarse-grained for global or local optimization. RTCut, a new method combining the strength of category ranking and scoring, outperforms both PCut and RCut significantly.