Optimizing convenient online access to bibliographic databases
Information Services and Use
Term-weighting approaches in automatic text retrieval
Information Processing and Management: an International Journal
Automatic text processing: the transformation, analysis, and retrieval of information by computer
Automatic text processing: the transformation, analysis, and retrieval of information by computer
Classifying news stories using memory based reasoning
SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
Automated learning of decision rules for text categorization
ACM Transactions on Information Systems (TOIS)
Term-relevance computations and perfect retrieval performance
Information Processing and Management: an International Journal
Training algorithms for linear text classifiers
SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Learning and Revising User Profiles: The Identification ofInteresting Web Sites
Machine Learning - Special issue on multistrategy learning
Feature Subset Selection in Text-Learning
ECML '98 Proceedings of the 10th European Conference on Machine Learning
A Comparative Study on Feature Selection in Text Categorization
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Supervised Wrapper Generation with Lixto
Proceedings of the 27th International Conference on Very Large Data Bases
A Machine Learning Approach to Web Mining
AI*IA '99 Proceedings of the 6th Congress of the Italian Association for Artificial Intelligence on Advances in Artificial Intelligence
Classification of HTML Documents by Hidden Tree-Markov Models
ICDAR '01 Proceedings of the Sixth International Conference on Document Analysis and Recognition
A system for induction of oblique decision trees
Journal of Artificial Intelligence Research
The automatic creation of literature abstracts
IBM Journal of Research and Development
Classifying web documents in a hierarchy of categories: a comprehensive study
Journal of Intelligent Information Systems
Hierarchical classification of HTML documents with WebClassII
ECIR'03 Proceedings of the 25th European conference on IR research
Hi-index | 0.00 |
In this paper, the problem of classifying HTML documents is investigated in the context of a client-server application, named WebClass, developed to support the search activity of a geographically distributed group of people with common interests. The two main issues studied in the paper are the selection of some features to represent HTML documents and the construction of the classifiers. A new feature selection technique is presented and its interaction with different classifiers is experimentally studied. Results show that performance improves even with simple classifiers and the proposed feature selection technique compares favorably with respect to other well-known approaches.