Machine learning in automated text categorisation

Authors:
Fabrizio Sebastiani
Affiliations:
-
Venue:
Machine learning in automated text categorisation
Year:
1999

Citing 0
Cited 23

An improved boosting algorithm and its application to text categorization

Proceedings of the ninth international conference on Information and knowledge management
Using LSI for text classification in the presence of background text

Proceedings of the tenth international conference on Information and knowledge management
Classifying text documents by associating terms with text categories

ADC '02 Proceedings of the 13th Australasian database conference - Volume 5
The use of bigrams to enhance text categorization

Information Processing and Management: an International Journal
Cooperation of Multiple Strategies for Automated Learning in Complex Environments

ISMIS '02 Proceedings of the 13th International Symposium on Foundations of Intelligent Systems
Document Classification and Interpretation through the Inference of Logic-Based Models

ECDL '01 Proceedings of the 5th European Conference on Research and Advanced Technology for Digital Libraries
Dynamic Models of Expert Groups to Recommend Web Documents

ECDL '01 Proceedings of the 5th European Conference on Research and Advanced Technology for Digital Libraries
Combining Multiclass Maximum Entropy Text Classifiers with Neural Network Voting

PorTAL '02 Proceedings of the Third International Conference on Advances in Natural Language Processing
Learning Logic Models for Automated Text Categorization

AI*IA 01 Proceedings of the 7th Congress of the Italian Association for Artificial Intelligence on Advances in Artificial Intelligence
Support vector machine active learning with applications to text classification

The Journal of Machine Learning Research
Using latent semantic indexing to filter spam

Proceedings of the 2003 ACM symposium on Applied computing
Evolving better stoplists for document clustering and web intelligence

Design and application of hybrid intelligent systems
Web Mining: Research and Practice

Computing in Science and Engineering
Feature selection with conditional mutual information maximin in text categorization

Proceedings of the thirteenth ACM international conference on Information and knowledge management
The BankSearch web document dataset: investigating unsupervised clustering and category similarity

Journal of Network and Computer Applications - Special issue on computational intelligence on the internet
A web-based multi-agent system approach to document engineering

International Journal of Web Engineering and Technology
Web Service Search on Large Scale

ICSOC-ServiceWave '09 Proceedings of the 7th International Joint Conference on Service-Oriented Computing
Ranking web documents with dynamic evaluation by expert groups

CAiSE'03 Proceedings of the 15th international conference on Advanced information systems engineering
Constructing maximum entropy language models for movie review subjectivity analysis

Journal of Computer Science and Technology
Improving text similarity measurement by critical sentence vector model

AIRS'05 Proceedings of the Second Asia conference on Asia Information Retrieval Technology
A rule filtering component based on recommendation agent system for classifying email document

PDCAT'04 Proceedings of the 5th international conference on Parallel and Distributed Computing: applications and Technologies
A machine learning approach to information extraction

CICLing'05 Proceedings of the 6th international conference on Computational Linguistics and Intelligent Text Processing
Term graph model for text classification

ADMA'05 Proceedings of the First international conference on Advanced Data Mining and Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

The automated categorisation (or classification) of texts into topical categories has a long history, dating back at least to the early ''60s. Until the late ''80s, the most effective approach to the problem seemed to be that of manually building automatic classifiers by means of {\em knowledge-engineering} techniques, i.e.\ manually defining a set of rules encoding expert knowledge on how to classify documents under a given set of categories. In the ''90s, with the booming production and availability of on-line documents, automated text categorisation has witnessed an increased and renewed interest, prompted by which the {\em machine learning} paradigm to automatic classifier construction has emerged and definitely superseded the knowledge-engineering approach. Within the machine learning paradigm, a general inductive process (called the {\em learner}) automatically builds a classifier (also called the {\em rule}, or the {\em hypothesis}) by ``learning'''', from a set of previously classified documents, the characteristics of one or more categories. The advantages of this approach are a very good effectiveness, a considerable savings in terms of expert manpower, and domain independence. In this survey we look at the main approaches that have been taken towards automatic text categorisation within the general machine learning paradigm. Issues pertaining to document indexing, classifier construction, and classifier evaluation, will be discussed in detail. A final section will be devoted to the techniques that have specifically been devised for an emerging application such as the automatic classification of Web pages into ``{\sc Yahoo!}-like'''' hierarchically structured sets of categories.