Text and hypertext categorization

Authors:
Houda Benbrahim;Max Bramer
Affiliations:
Ernst and Young LLP, London, United Kingdom;School of Computing, University of Portsmouth, Portsmouth, United Kingdom
Venue:
Artificial intelligence
Year:
2009

Citing 57
Cited 1

Term-weighting approaches in automatic text retrieval

Information Processing and Management: an International Journal
A probabilistic learning approach for document indexing

ACM Transactions on Information Systems (TOIS) - Special issue on research and development in information retrieval
An evaluation of phrasal and clustered representations on a text categorization task

SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
Classifying news stories using memory based reasoning

SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
Trading MIPS and memory for knowledge engineering

Communications of the ACM
Representation and learning in information retrieval

Representation and learning in information retrieval
Automatic indexing based on Bayesian inference networks

SIGIR '93 Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval
Automated learning of decision rules for text categorization

ACM Transactions on Information Systems (TOIS)
An example-based mapping method for text categorization and retrieval

ACM Transactions on Information Systems (TOIS)
Optimizing confidence of text classification by evolution of symbolic expressions

Advances in genetic programming
OHSUMED: an interactive retrieval evaluation and new large test collection for research

SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Improving text retrieval for the routing problem using latent semantic indexing

SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
The nature of statistical learning theory

The nature of statistical learning theory
A comparison of classifiers and document representations for the routing problem

SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Combining classifiers in text categorization

SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Context-sensitive learning methods for text categorization

SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Feature selection, perceptron learning, and a usability case study for text categorization

Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval
Autonomous document classification for business

AGENTS '97 Proceedings of the first international conference on Autonomous agents
On the Optimality of the Simple Bayesian Classifier under Zero-One Loss

Machine Learning - Special issue on learning with probabilistic representations
Inductive learning algorithms and representations for text categorization

Proceedings of the seventh international conference on Information and knowledge management
Using a generalized instance set for automatic text categorization

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Automatic essay grading using text categorization techniques

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Distributional clustering of words for text classification

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
A technique for measuring the relative size and overlap of public Web search engines

WWW7 Proceedings of the seventh international conference on World Wide Web 7
Foundations of statistical natural language processing

Foundations of statistical natural language processing
A re-examination of text categorization methods

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Hierarchical neural networks for text categorization (poster abstract)

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
A patent search and classification system

Proceedings of the fourth ACM conference on Digital libraries
Feature selection in SVM text categorization

AAAI '99/IAAI '99 Proceedings of the sixteenth national conference on Artificial intelligence and the eleventh Innovative applications of artificial intelligence conference innovative applications of artificial intelligence
Text classification using ESC-based stochastic decision lists

Proceedings of the eighth international conference on Information and knowledge management
Automatic Document Classification

Journal of the ACM (JACM)
Bringing order to the Web: automatically categorizing search results

Proceedings of the SIGCHI conference on Human Factors in Computing Systems
Probabilistic learning for selective dissemination of information

Information Processing and Management: an International Journal
Adaptive information filtering using evolutionary computation

Information Sciences: an International Journal - Special issue on frontiers in evolutionary algorithms
Text filtering by boosting naive Bayes classifiers

SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
A practical hypertext catergorization method using links and incrementally available class information

SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
BoosTexter: A Boosting-based Systemfor Text Categorization

Machine Learning - Special issue on information retrieval
Boosting for document routing

Proceedings of the ninth international conference on Information and knowledge management
Analyzing the effectiveness and applicability of co-training

Proceedings of the ninth international conference on Information and knowledge management
A vector space model for automatic indexing

Communications of the ACM
A learner-independent evaluation of the usefulness of statistical phrases for automated text categorization

Text databases & document management
Introduction to Modern Information Retrieval

Introduction to Modern Information Retrieval
A Study of Approaches to Hypertext Categorization

Journal of Intelligent Information Systems
Maximizing Text-Mining Performance

IEEE Intelligent Systems
Induction of Decision Trees

Machine Learning
Naive (Bayes) at Forty: The Independence Assumption in Information Retrieval

ECML '98 Proceedings of the 10th European Conference on Machine Learning
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features

ECML '98 Proceedings of the 10th European Conference on Machine Learning
Hierarchically Classifying Documents Using Very Few Words

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
A Comparative Study on Feature Selection in Text Categorization

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Discovering Test Set Regularities in Relational Domains

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Using Taxonomy, Discriminants, and Signatures for Navigating in Text Databases

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Applying an existing machine learning algorithm to text categorization

Connectionist, Statistical, and Symbolic Approaches to Learning for Natural Language Processing
Feature Reduction for Neural Network Based Text Categorization

DASFAA '99 Proceedings of the Sixth International Conference on Database Systems for Advanced Applications
Employing EM and Pool-Based Active Learning for Text Classification

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Scalable feature selection, classification and signature generation for organizing large text databases into hierarchical topic taxonomies

The VLDB Journal — The International Journal on Very Large Data Bases
Feature selection and feature extraction for text categorization

HLT '91 Proceedings of the workshop on Speech and Natural Language
The automatic creation of literature abstracts

IBM Journal of Research and Development

Cloud service: automatic construction and evolution of software process problem-solving resource space

The Journal of Supercomputing

Quantified Score

Hi-index	0.01

Visualization

Abstract

Automatic categorization of text documents has become an important area of research in the last two decades, with features that make it significantly more difficult than the traditional classification tasks studied in machine learning. A more recent development is the need to classify hypertext documents, most notably web pages. These have features that add further complexity to the categorization task but also offer the possibility of using information that is not available in standard text classification, such as metadata and the content of the web pages that point to and are pointed at by a web page of interest. This chapter surveys the state of the art in text categorization and hypertext categorization, focussing particularly on issues of representation that differentiate them from 'conventional' classification tasks and from each other.