Term-weighting approaches in automatic text retrieval
Information Processing and Management: an International Journal
Automated learning of decision rules for text categorization
ACM Transactions on Information Systems (TOIS)
The nature of statistical learning theory
The nature of statistical learning theory
Feature selection, perceptron learning, and a usability case study for text categorization
Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval
On the Optimality of the Simple Bayesian Classifier under Zero-One Loss
Machine Learning - Special issue on learning with probabilistic representations
Inductive learning algorithms and representations for text categorization
Proceedings of the seventh international conference on Information and knowledge management
Boosting and Rocchio applied to text filtering
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Fast training of support vector machines using sequential minimal optimization
Advances in kernel methods
Classification by pairwise coupling
NIPS '97 Proceedings of the 1997 conference on Advances in neural information processing systems 10
A re-examination of text categorization methods
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Hierarchical classification of Web content
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
BoosTexter: A Boosting-based Systemfor Text Categorization
Machine Learning - Special issue on information retrieval
Learning to construct knowledge bases from the World Wide Web
Artificial Intelligence - Special issue on Intelligent internet systems
An Evaluation of Statistical Approaches to Text Categorization
Information Retrieval
Machine learning in automated text categorization
ACM Computing Surveys (CSUR)
Machine Learning
Exploiting Hierarchy in Text Categorization
Information Retrieval
Hierarchical Text Categorization Using Neural Networks
Information Retrieval
A Probabilistic Framework for the Hierarchic Organisation and Classification of Document Collections
Journal of Intelligent Information Systems
Feature selection on hierarchy of web documents
Decision Support Systems - Web retrieval and mining
Naive (Bayes) at Forty: The Independence Assumption in Information Retrieval
ECML '98 Proceedings of the 10th European Conference on Machine Learning
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features
ECML '98 Proceedings of the 10th European Conference on Machine Learning
Feature Subset Selection in Text-Learning
ECML '98 Proceedings of the 10th European Conference on Machine Learning
Hierarchically Classifying Documents Using Very Few Words
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
A Comparative Study on Feature Selection in Text Categorization
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Improving Text Classification by Shrinkage in a Hierarchy of Classes
ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Feature Selection for Unbalanced Class Distribution and Naive Bayes
ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Hierarchical Text Classification and Evaluation
ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Mining HTML Pages to Support Document Sharing in a Cooperative System
EDBT '02 Proceedings of the Worshops XMLDM, MDDE, and YRWS on XML-Based Data Management and Multimedia Engineering-Revised Papers
Effective Methods for Improving Naive Bayes Text Classifiers
PRICAI '02 Proceedings of the 7th Pacific Rim International Conference on Artificial Intelligence: Trends in Artificial Intelligence
Experiment with a hierarchical text categorization method on the WIPO-alpha patent collection
ISUMA '03 Proceedings of the 4th International Symposium on Uncertainty Modelling and Analysis
Supervised term weighting for automated text categorization
Proceedings of the 2003 ACM symposium on Applied computing
Effect of term distributions on centroid-based text categorization
Information Sciences—Informatics and Computer Science: An International Journal - Special issue: Informatics and computer science intelligent systems applications
RCV1: A New Benchmark Collection for Text Categorization Research
The Journal of Machine Learning Research
Feature selection for text categorization on imbalanced data
ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Multi-dimensional text classification
COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Hierarchical classification of HTML documents with WebClassII
ECIR'03 Proceedings of the 25th European conference on IR research
Boosting multi-label hierarchical text categorization
Information Retrieval
Web-scale named entity recognition
Proceedings of the 17th ACM conference on Information and knowledge management
An extensive study on automated Dewey Decimal Classification
Journal of the American Society for Information Science and Technology
An overview of AI research in Italy
Artificial intelligence
Time-Slice Density Estimation for Semantic-Based Tourist Destination Suggestion
Proceedings of the 2010 conference on ECAI 2010: 19th European Conference on Artificial Intelligence
Transductive learning from textual data with relevant example selection
DEXA'10 Proceedings of the 21st international conference on Database and expert systems applications: Part II
A survey of hierarchical classification across different application domains
Data Mining and Knowledge Discovery
ECIR'11 Proceedings of the 33rd European conference on Advances in information retrieval
A comparative study of thresholding strategies in progressive filtering
AI*IA'11 Proceedings of the 12th international conference on Artificial intelligence around man and beyond
Active learning for hierarchical text classification
PAKDD'12 Proceedings of the 16th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part I
Hierarchical classification of web documents by stratified discriminant analysis
IRFC'12 Proceedings of the 5th conference on Multidisciplinary Information Retrieval
Hi-index | 0.00 |
Most of the research on text categorization has focused on classifying text documents into a set of categories with no structural relationships among them (flat classification). However, in many information repositories documents are organized in a hierarchy of categories to support a thematic search by browsing topics of interests. The consideration of the hierarchical relationship among categories opens several additional issues in the development of methods for automated document classification. Questions concern the representation of documents, the learning process, the classification process and the evaluation criteria of experimental results. They are systematically investigated in this paper, whose main contribution is a general hierarchical text categorization framework where the hierarchy of categories is involved in all phases of automated document classification, namely feature selection, learning and classification of a new document. An automated threshold determination method for classification scores is embedded in the proposed framework. It can be applied to any classifier that returns a degree of membership of a document to a category. In this work three learning methods are considered for the construction of document classifiers, namely centroid-based, naïve Bayes and SVM. The proposed framework has been implemented in the system WebClassIII and has been tested on three datasets (Yahoo, DMOZ, RCV1) which present a variety of situations in terms of hierarchical structure. Experimental results are reported and several conclusions are drawn on the comparison of the flat vs. the hierarchical approach as well as on the comparison of different hierarchical classifiers. The paper concludes with a review of related work and a discussion of previous findings vs. our findings.