A statistical learning learning model of text classification for support vector machines

Authors:
Thorsten Joachims
Affiliations:
GMD Forschungszentrum IT, Sankt Augustin, Germany
Venue:
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Year:
2001

Citing 11
Cited 84

Retrieval test evaluation of a rule based automatic indexing (AIR/PHYS)

Proc. of the third joint BCS and ACM symposium on Research and development in information retrieval
Some inconsistencies and misnomers in probabilistic information retrieval

SIGIR '91 Proceedings of the 14th annual international ACM SIGIR conference on Research and development in information retrieval
The nature of statistical learning theory

The nature of statistical learning theory
Latent semantic indexing: a probabilistic analysis

PODS '98 Proceedings of the seventeenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Inductive learning algorithms and representations for text categorization

Proceedings of the seventh international conference on Information and knowledge management
Making large-scale support vector machine learning practical

Advances in kernel methods
A probabilistic description-oriented approach for categorizing web documents

Proceedings of the eighth international conference on Information and knowledge management
A Tutorial on Support Vector Machines for Pattern Recognition

Data Mining and Knowledge Discovery
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features

ECML '98 Proceedings of the 10th European Conference on Machine Learning
Estimating the Generalization Performance of an SVM Efficiently

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Using machine learning to improve information access

Using machine learning to improve information access

Topic-oriented collaborative crawling

Proceedings of the eleventh international conference on Information and knowledge management
Categorizing information objects from user access patterns

Proceedings of the eleventh international conference on Information and knowledge management
An Approach to Microscopic Clustering of Terms and Documents

PRICAI '02 Proceedings of the 7th Pacific Rim International Conference on Artificial Intelligence: Trends in Artificial Intelligence
Scaling multi-class support vector machines using inter-class confusion

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Automatic Textual Document Categorization Based on Generalized Instance Sets and a Metamodel

IEEE Transactions on Pattern Analysis and Machine Intelligence
Automatic document metadata extraction using support vector machines

Proceedings of the 3rd ACM/IEEE-CS joint conference on Digital libraries
Distributional word clusters vs. words for text categorization

The Journal of Machine Learning Research
Fast and accurate text classification via multiple linear discriminant projections

The VLDB Journal — The International Journal on Very Large Data Bases
Text classification from positive and unlabeled documents

CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
Efficient multi-way text categorization via generalized discriminant analysis

CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
Generating fuzzy semantic metadata describing spatial relations from images using the R-histogram

Proceedings of the 4th ACM/IEEE-CS joint conference on Digital libraries
Two supervised learning approaches for name disambiguation in author citations

Proceedings of the 4th ACM/IEEE-CS joint conference on Digital libraries
GeneWays: a system for extracting, analyzing, visualizing, and integrating molecular pathway data

Journal of Biomedical Informatics
Learning to Decode Cognitive States from Brain Images

Machine Learning
Robust feature induction for support vector machines

ICML '04 Proceedings of the twenty-first international conference on Machine learning
A comparison of active classification methods for content-based image retrieval

Proceedings of the 1st international workshop on Computer vision meets databases
Application of learned user context to improve web search results

Journal of Computing Sciences in Colleges
A method of cluster-based indexing of textual data

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Automatic acknowledgement indexing: expanding the semantics of contribution in the CiteSeer digital library

Proceedings of the 3rd international conference on Knowledge capture
Discretization based learning approach to information retrieval

Proceedings of the 14th ACM international conference on Information and knowledge management
Single-Class Classification with Mapping Convergence

Machine Learning
Information gain and divergence-based feature selection for machine learning-based text categorization

Information Processing and Management: an International Journal - Special issue: Formal methods for information retrieval
Blocking objectionable web content by leveraging multiple information sources

ACM SIGKDD Explorations Newsletter
Extracting key-substring-group features for text classification

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
A New Text Categorization Technique Using Distributional Clustering and Learning Logic

IEEE Transactions on Knowledge and Data Engineering
Higher order feature selection for text classification

Knowledge and Information Systems
Content based SMS spam filtering

Proceedings of the 2006 ACM symposium on Document engineering
Adaptive non-linear clustering in data streams

CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Discretization based learning approach to information retrieval

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
An adaptive error penalization method for training an efficient and generalized SVM

Pattern Recognition
A robust multilingual portable phrase chunking system

Expert Systems with Applications: An International Journal
Efficient text chunking using linear kernel with masked method

Knowledge-Based Systems
Computerized retrieval and classification: An application to reasons for late filings with the securities and exchange commission

Intelligent Data Analysis
Web page title extraction and its application

Information Processing and Management: an International Journal
An intelligent information agent for document title classification and filtering in document-intensive domains

Decision Support Systems
Regularized least squares support vector regression for the simultaneous learning of a function and its derivatives

Information Sciences: an International Journal
Fixed-threshold SMO for Joint Constraint Learning Algorithm of Structural SVM

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Text categorization via generalized discriminant analysis

Information Processing and Management: an International Journal
Fuzzy Kernel Ridge Regression for Classification

ICANNGA '07 Proceedings of the 8th international conference on Adaptive and Natural Computing Algorithms, Part I
Mistaken Driven and Unconditional Learning of NTC

ISNN '07 Proceedings of the 4th international symposium on Neural Networks: Advances in Neural Networks
Imbalanced text classification: A term weighting approach

Expert Systems with Applications: An International Journal
Incremental data-driven learning of a novelty detection model for one-class classification with application to high-dimensional noisy data

Machine Learning
MMR-based feature selection for text categorization

HLT-NAACL-Short '04 Proceedings of HLT-NAACL 2004: Short Papers
Topic Significance Ranking of LDA Generative Models

ECML PKDD '09 Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases: Part I
Discovering domain-specific composite kernels

AAAI'05 Proceedings of the 20th national conference on Artificial intelligence - Volume 2
Evaluation of video news classification techniques for automatic content personalisation

International Journal of Advanced Media and Communication
Learning to understand web site update requests

IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence
Improving web page classification by label-propagation over click graphs

Proceedings of the 18th ACM conference on Information and knowledge management
Video news classification for automatic content personalization: a genetic algorithm based approach

Proceedings of the 14th Brazilian Symposium on Multimedia and the Web
Towards automated assessment of engineering assignments

IJCNN'09 Proceedings of the 2009 international joint conference on Neural Networks
Information gain and divergence-based feature selection for machine learning-based text categorization

Information Processing and Management: an International Journal - Special issue: Formal methods for information retrieval
BioPPISVMExtractor: A protein-protein interaction extractor for biomedical literature using SVM and rich feature sets

Journal of Biomedical Informatics
Ads-portal domains: Identification and measurements

ACM Transactions on the Web (TWEB)
A simple probability based term weighting scheme for automated text classification

IEA/AIE'07 Proceedings of the 20th international conference on Industrial, engineering, and other applications of applied intelligent systems
Multi-modality in one-class classification

Proceedings of the 19th international conference on World wide web
Elements of a learning interface for genre qualified search

AI'07 Proceedings of the 20th Australian joint conference on Advances in artificial intelligence
Discriminant analysis for fast multiclass data classification through regularized kernel function approximation

IEEE Transactions on Neural Networks
Urdu text classification

Proceedings of the 7th International Conference on Frontiers of Information Technology
Multi-label Wikipedia classification with textual and link features

INEX'09 Proceedings of the Focused retrieval and evaluation, and 8th international conference on Initiative for the evaluation of XML retrieval
Predicting consumer sentiments from online text

Decision Support Systems
Rough set and ensemble learning based semi-supervised algorithm for text classification

Expert Systems with Applications: An International Journal
Class-dependent projection based method for text categorization

Pattern Recognition Letters
Efficient processing of top-k spatial keyword queries

SSTD'11 Proceedings of the 12th international conference on Advances in spatial and temporal databases
Extracting named entities using support vector machines

KDLL'06 Proceedings of the 2006 international conference on Knowledge Discovery in Life Science Literature
Markov blankets and meta-heuristics search: sentiment extraction from unstructured texts

WebKDD'04 Proceedings of the 6th international conference on Knowledge Discovery on the Web: advances in Web Mining and Web Usage Analysis
A two-stage decision model for information filtering

Decision Support Systems
A general and multi-lingual phrase chunking model based on masking method

CICLing'06 Proceedings of the 7th international conference on Computational Linguistics and Intelligent Text Processing
Feature selection in text classification via SVM and LSI

ISNN'06 Proceedings of the Third international conference on Advances in Neural Networks - Volume Part I
Text classification: combining grouping, LSA and kNN vs support vector machine

KES'06 Proceedings of the 10th international conference on Knowledge-Based Intelligent Information and Engineering Systems - Volume Part II
Email categorization with tournament methods

NLDB'05 Proceedings of the 10th international conference on Natural Language Processing and Information Systems
FASiL adaptive email categorization system

CICLing'05 Proceedings of the 6th international conference on Computational Linguistics and Intelligent Text Processing
Text classification with tournament methods

TSD'05 Proceedings of the 8th international conference on Text, Speech and Dialogue
Header metadata extraction from semi-structured documents using template matching

OTM'06 Proceedings of the 2006 international conference on On the Move to Meaningful Internet Systems: AWeSOMe, CAMS, COMINF, IS, KSinBIT, MIOS-CIAO, MONET - Volume Part II
Increasing efficiency of SVM by adaptively penalizing outliers

EMMCVPR'05 Proceedings of the 5th international conference on Energy Minimization Methods in Computer Vision and Pattern Recognition
Learning to classify service data with latent semantics

RSKT'12 Proceedings of the 7th international conference on Rough Sets and Knowledge Technology
Authorship attribution based on a probabilistic topic model

Information Processing and Management: an International Journal
A support vector machine-based context-ranking model for question answering

Information Sciences: an International Journal
A Semantic Triplet Based Story Classifier

ASONAM '12 Proceedings of the 2012 International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2012)
Class-indexing-based term weighting for automatic text classification

Information Sciences: an International Journal
Recognition of word collocation habits using frequency rank ratio and inter-term intimacy

Expert Systems with Applications: An International Journal
ST-HBase: a scalable data management system for massive geo-tagged objects

WAIM'13 Proceedings of the 14th international conference on Web-Age Information Management
A social network-empowered research analytics framework for project selection

Decision Support Systems
Fuzzy unordered rule induction algorithm in text categorization on top of geometric particle swarm optimization term selection

Knowledge-Based Systems
Genetic optimized artificial immune system in spam detection: a review and a model

Artificial Intelligence Review

Quantified Score

Hi-index	0.01

Visualization

Abstract

This paper develops a theoretical learning model of text classification for Support Vector Machines (SVMs). It connects the statistical properties of text-classification tasks with the generalization performance of a SVM in a quantitative way. Unlike conventional approaches to learning text classifiers, which rely primarily on empirical evidence, this model explains why and when SVMs perform well for text classification. In particular, it addresses the following questions: Why can support vector machines handle the large feature spaces in text classification effectively? How is this related to the statistical properties of text? What are sufficient conditions for applying SVMs to text-classification problems successfully?