Optimizing convenient online access to bibliographic databases
Information Services and Use
Automatic text processing: the transformation, analysis, and retrieval of information by computer
Automatic text processing: the transformation, analysis, and retrieval of information by computer
Hierarchical classification of Web content
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Machine learning in automated text categorization
ACM Computing Surveys (CSUR)
Hierarchically Classifying Documents Using Very Few Words
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Improving Text Classification by Shrinkage in a Hierarchy of Classes
ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Mining HTML Pages to Support Document Sharing in a Cooperative System
EDBT '02 Proceedings of the Worshops XMLDM, MDDE, and YRWS on XML-Based Data Management and Multimedia Engineering-Revised Papers
AAAI'96 Proceedings of the thirteenth national conference on Artificial intelligence - Volume 1
Clustering documents in a web directory
WIDM '03 Proceedings of the 5th ACM international workshop on Web information and data management
Clustering documents into a web directory for bootstrapping a supervised classification
Data & Knowledge Engineering - Special issue: WIDM 2003
Hierarchical Dirichlet model for document classification
ICML '05 Proceedings of the 22nd international conference on Machine learning
Classifying web documents in a hierarchy of categories: a comprehensive study
Journal of Intelligent Information Systems
Building Quality-Based Views of the Web
AI*IA '07 Proceedings of the 10th Congress of the Italian Association for Artificial Intelligence on AI*IA 2007: Artificial Intelligence and Human-Oriented Computing
Web Document Classification Based on Rough Set
RSFDGrC '07 Proceedings of the 11th International Conference on Rough Sets, Fuzzy Sets, Data Mining and Granular Computing
Intelligent information access by learning wordnet-based user profiles
AI*IA'05 Proceedings of the 9th conference on Advances in Artificial Intelligence
Regularization for unsupervised classification on taxonomies
ISMIS'06 Proceedings of the 16th international conference on Foundations of Intelligent Systems
Importance of HTML structural elements and metadata in automated subject classification
ECDL'05 Proceedings of the 9th European conference on Research and Advanced Technology for Digital Libraries
Helping physicians to organize guidelines within conceptual hierarchies
AIME'05 Proceedings of the 10th conference on Artificial Intelligence in Medicine
WordNet-Based word sense disambiguation for learning user profiles
EWMF'05/KDO'05 Proceedings of the 2005 joint international conference on Semantics, Web and Mining
Hi-index | 0.00 |
This paper describes a new method for the classification of a HTML document into a hierarchy of categories. The hierarchy of categories is involved in all phases of automated document classification, namely feature extraction, learning, and classification of a new document. The innovative aspects of this work are the feature selection process, the automated threshold determination for classification scores, and an experimental study on real-word Web documents that can be associated to any node in the hierarchy. Moreover, a new measure for the evaluation of system performances has been introduced in order to compare three different techniques (flat, hierarchical with proper training sets, hierarchical with hierarchical training sets). The method has been implemented in the context of a client-server application, named WebClassII. Results show that for hierarchical techniques it is better to use hierarchical training sets.