Term-weighting approaches in automatic text retrieval
Information Processing and Management: an International Journal
A probabilistic learning approach for document indexing
ACM Transactions on Information Systems (TOIS) - Special issue on research and development in information retrieval
An evaluation of phrasal and clustered representations on a text categorization task
SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
Classifying news stories using memory based reasoning
SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
Trading MIPS and memory for knowledge engineering
Communications of the ACM
Representation and learning in information retrieval
Representation and learning in information retrieval
Automatic indexing based on Bayesian inference networks
SIGIR '93 Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval
Automated learning of decision rules for text categorization
ACM Transactions on Information Systems (TOIS)
An example-based mapping method for text categorization and retrieval
ACM Transactions on Information Systems (TOIS)
Optimizing confidence of text classification by evolution of symbolic expressions
Advances in genetic programming
OHSUMED: an interactive retrieval evaluation and new large test collection for research
SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Improving text retrieval for the routing problem using latent semantic indexing
SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
The nature of statistical learning theory
The nature of statistical learning theory
A comparison of classifiers and document representations for the routing problem
SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Combining classifiers in text categorization
SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Context-sensitive learning methods for text categorization
SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Feature selection, perceptron learning, and a usability case study for text categorization
Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval
Autonomous document classification for business
AGENTS '97 Proceedings of the first international conference on Autonomous agents
On the Optimality of the Simple Bayesian Classifier under Zero-One Loss
Machine Learning - Special issue on learning with probabilistic representations
Inductive learning algorithms and representations for text categorization
Proceedings of the seventh international conference on Information and knowledge management
Using a generalized instance set for automatic text categorization
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Automatic essay grading using text categorization techniques
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Distributional clustering of words for text classification
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
A technique for measuring the relative size and overlap of public Web search engines
WWW7 Proceedings of the seventh international conference on World Wide Web 7
Foundations of statistical natural language processing
Foundations of statistical natural language processing
A re-examination of text categorization methods
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Hierarchical neural networks for text categorization (poster abstract)
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
A patent search and classification system
Proceedings of the fourth ACM conference on Digital libraries
Feature selection in SVM text categorization
AAAI '99/IAAI '99 Proceedings of the sixteenth national conference on Artificial intelligence and the eleventh Innovative applications of artificial intelligence conference innovative applications of artificial intelligence
Text classification using ESC-based stochastic decision lists
Proceedings of the eighth international conference on Information and knowledge management
Automatic Document Classification
Journal of the ACM (JACM)
Bringing order to the Web: automatically categorizing search results
Proceedings of the SIGCHI conference on Human Factors in Computing Systems
Probabilistic learning for selective dissemination of information
Information Processing and Management: an International Journal
Adaptive information filtering using evolutionary computation
Information Sciences: an International Journal - Special issue on frontiers in evolutionary algorithms
Text filtering by boosting naive Bayes classifiers
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
BoosTexter: A Boosting-based Systemfor Text Categorization
Machine Learning - Special issue on information retrieval
Proceedings of the ninth international conference on Information and knowledge management
Analyzing the effectiveness and applicability of co-training
Proceedings of the ninth international conference on Information and knowledge management
A vector space model for automatic indexing
Communications of the ACM
Text databases & document management
Introduction to Modern Information Retrieval
Introduction to Modern Information Retrieval
A Study of Approaches to Hypertext Categorization
Journal of Intelligent Information Systems
Maximizing Text-Mining Performance
IEEE Intelligent Systems
Machine Learning
Naive (Bayes) at Forty: The Independence Assumption in Information Retrieval
ECML '98 Proceedings of the 10th European Conference on Machine Learning
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features
ECML '98 Proceedings of the 10th European Conference on Machine Learning
Hierarchically Classifying Documents Using Very Few Words
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
A Comparative Study on Feature Selection in Text Categorization
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Discovering Test Set Regularities in Relational Domains
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Using Taxonomy, Discriminants, and Signatures for Navigating in Text Databases
VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Applying an existing machine learning algorithm to text categorization
Connectionist, Statistical, and Symbolic Approaches to Learning for Natural Language Processing
Feature Reduction for Neural Network Based Text Categorization
DASFAA '99 Proceedings of the Sixth International Conference on Database Systems for Advanced Applications
Employing EM and Pool-Based Active Learning for Text Classification
ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
The VLDB Journal — The International Journal on Very Large Data Bases
Feature selection and feature extraction for text categorization
HLT '91 Proceedings of the workshop on Speech and Natural Language
The automatic creation of literature abstracts
IBM Journal of Research and Development
The Journal of Supercomputing
Hi-index | 0.01 |
Automatic categorization of text documents has become an important area of research in the last two decades, with features that make it significantly more difficult than the traditional classification tasks studied in machine learning. A more recent development is the need to classify hypertext documents, most notably web pages. These have features that add further complexity to the categorization task but also offer the possibility of using information that is not available in standard text classification, such as metadata and the content of the web pages that point to and are pointed at by a web page of interest. This chapter surveys the state of the art in text categorization and hypertext categorization, focussing particularly on issues of representation that differentiate them from 'conventional' classification tasks and from each other.