SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Noise reduction in a statistical approach to text categorization
SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Training algorithms for linear text classifiers
SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Matrix computations (3rd ed.)
A re-examination of text categorization methods
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
On power-law relationships of the Internet topology
Proceedings of the conference on Applications, technologies, architectures, and protocols for computer communication
An Evaluation of Statistical Approaches to Text Categorization
Information Retrieval
Text Categorization Based on Regularized Linear Classification Methods
Information Retrieval
A Study of Approaches to Hypertext Categorization
Journal of Intelligent Information Systems
Lanczos Algorithms for Large Symmetric Eigenvalue Computations, Vol. 1
Lanczos Algorithms for Large Symmetric Eigenvalue Computations, Vol. 1
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features
ECML '98 Proceedings of the 10th European Conference on Machine Learning
A Comparative Study on Feature Selection in Text Categorization
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
On redundancy of training corpus for text categorization: a perspective of geometry
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
IEEE Transactions on Knowledge and Data Engineering
Support vector machines classification with a very large-scale taxonomy
ACM SIGKDD Explorations Newsletter - Natural language processing and text mining
An analysis of the coupling between training set and neighborhood sizes for the kNN classifier
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Acclimatizing Taxonomic Semantics for Hierarchical Content Classification
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Automated extraction of behavioural profiles from document usage
BT Technology Journal
Automatic Ontology Generation Using Schema Information
WI '06 Proceedings of the 2006 IEEE/WIC/ACM International Conference on Web Intelligence
A study of local and global thresholding techniques in text categorization
AusDM '06 Proceedings of the fifth Australasian conference on Data mining and analystics - Volume 61
Reconstructing ddc for interactive classification
Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Topic taxonomy adaptation for group profiling
ACM Transactions on Knowledge Discovery from Data (TKDD)
Deep classifier: automatically categorizing search results into large-scale hierarchies
WSDM '08 Proceedings of the 2008 International Conference on Web Search and Data Mining
Using ambiguity measure feature selection algorithm for support vector machine classifier
Proceedings of the 2008 ACM symposium on Applied computing
Text classification: a recent overview
ICCOMP'05 Proceedings of the 9th WSEAS International Conference on Computers
Boosting multi-label hierarchical text categorization
Information Retrieval
Integrating Cross-Language Hierarchies and Its Application to Retrieving Relevant Documents
ACM Transactions on Asian Language Information Processing (TALIP)
Deep classification in large-scale text hierarchies
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
The study of drug-reaction relationships using global optimization techniques
Optimization Methods & Software - Systems Analysis, Optimization and Data Mining in Biomedicine
Boosting RVM Classifiers for Large Data Sets
ICANNGA '07 Proceedings of the 8th international conference on Adaptive and Natural Computing Algorithms, Part II
Discovering Knowledge in a Large Organization through Support Vector Machines
ICCS '08 Proceedings of the 8th international conference on Computational Science, Part III
Error-driven generalist+experts (edge): a multi-stage ensemble framework for text categorization
Proceedings of the 17th ACM conference on Information and knowledge management
Web page classification: Features and algorithms
ACM Computing Surveys (CSUR)
Ontology Construction Based on Latent Topic Extraction in a Digital Library
ICADL 08 Proceedings of the 11th International Conference on Asian Digital Libraries: Universal and Ubiquitous Access to Information
A hidden Markov model-based text classification of medical documents
Journal of Information Science
Simple but Effective Porn Query Recognition by k-NN with Semantic Similarity Measure
APWeb/WAIM '09 Proceedings of the Joint International Conferences on Advances in Data and Web Management
A hierarchical approach to encoding medical concepts for clinical notes
HLT-SRWS '08 Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Student Research Workshop
Preferential text classification: learning algorithms and evaluation measures
Information Retrieval
An extensive study on automated Dewey Decimal Classification
Journal of the American Society for Information Science and Technology
Agent-assisted task management that reduces email overload
Proceedings of the 15th international conference on Intelligent user interfaces
Does SVM really scale up to large bag of words feature spaces?
IDA'07 Proceedings of the 7th international conference on Intelligent data analysis
Combining global and local information for enhanced deep classification
Proceedings of the 2010 ACM Symposium on Applied Computing
The ECIR 2010 large scale hierarchical classification workshop
ACM SIGIR Forum
ECIR'11 Proceedings of the 33rd European conference on Advances in information retrieval
Feature sub-set selection metrics for Arabic text classification
Pattern Recognition Letters
A soft real-time web news classification system with double control loops
WAIM'05 Proceedings of the 6th international conference on Advances in Web-Age Information Management
A term weighting approach for text categorization
AIRS'05 Proceedings of the Second Asia conference on Asia Information Retrieval Technology
Systematic construction of hierarchical classifier in SVM-Based text categorization
IJCNLP'04 Proceedings of the First international joint conference on Natural Language Processing
Towards automatic concept hierarchy generation for specific knowledge network
IEA/AIE'06 Proceedings of the 19th international conference on Advances in Applied Artificial Intelligence: industrial, Engineering and Other Applications of Applied Intelligent Systems
TreeBoost.MH: a boosting algorithm for multi-label hierarchical text categorization
SPIRE'06 Proceedings of the 13th international conference on String Processing and Information Retrieval
AIS'04 Proceedings of the 13th international conference on AI, Simulation, and Planning in High Autonomy Systems
On the behavior of SVM and some older algorithms in binary text classification tasks
TSD'06 Proceedings of the 9th international conference on Text, Speech and Dialogue
Automated learning of RVM for large scale text sets: divide to conquer
IDEAL'06 Proceedings of the 7th international conference on Intelligent Data Engineering and Automated Learning
Recursive regularization for large-scale classification with hierarchical and graphical dependencies
Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
Arabic Text Categorization Based on Arabic Wikipedia
ACM Transactions on Asian Language Information Processing (TALIP)
Hi-index | 0.00 |
Real-world applications of text categorization often require a system to deal with tens of thousands of categories defined over a large taxonomy. This paper addresses the problem with respect to a set of popular algorithms in text categorization, including Support Vector Machines, k-nearest neighbor, ridge regression, linear least square fit and logistic regression. By providing a formal analysis of the computational complexity of each classification method, followed by an investigation on the usage of different classifiers in a hierarchical setting of categorization, we show how the scalability of a method depends on the topology of the hierarchy and the category distributions. In addition, we are able to obtain tight bounds for the complexities by using the power law to approximate category distributions over a hierarchy. Experiments with kNN and SVM classifiers on the OHSUMED corpus are reported on, as concrete examples.