Text Categorization with Support Vector Machines. How to Represent Texts in Input Space?

Authors:
Edda Leopold;Jörg Kindermann
Affiliations:
GMD German National Research Center for Information Technology, Institute for Autonomous intelligent Systems, Schloss Birlinghoven, D-53754 Sankt Augustin, Germany. edda.leopold@ais.fraunh ...;GMD German National Research Center for Information Technology, Institute for Autonomous intelligent Systems, Schloss Birlinghoven, D-53754 Sankt Augustin, Germany. joerg.kindermann@ais.fr ...
Venue:
Machine Learning
Year:
2002

Citing 7
Cited 65

Modelling documents with multiple Poisson distributions

Information Processing and Management: an International Journal
Inductive learning algorithms and representations for text categorization

Proceedings of the seventh international conference on Information and knowledge management
Foundations of statistical natural language processing

Foundations of statistical natural language processing
Text Classification from Labeled and Unlabeled Documents using EM

Machine Learning - Special issue on information retrieval
Introduction to Modern Information Retrieval

Introduction to Modern Information Retrieval
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features

ECML '98 Proceedings of the 10th European Conference on Machine Learning
A freely available morphological analyzer, disambiguator and context sensitive lemmatizer for German

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 2

Latent Semantic Kernels

Journal of Intelligent Information Systems
Enhancing Experience Management and Process Learning with Moderated Discourses: The indiGo Approach

PAKM '02 Proceedings of the 4th International Conference on Practical Aspects of Knowledge Management
Error Correcting Codes with Optimized Kullback-Leibler Distances for Text Categorization

PKDD '01 Proceedings of the 5th European Conference on Principles of Data Mining and Knowledge Discovery
SVM Classification Using Sequences of Phonemes and Syllables

PKDD '02 Proceedings of the 6th European Conference on Principles of Data Mining and Knowledge Discovery
The indiGo Project: Enhancement of Experience Management and Process Learning with Moderated Discourses

Industrial Conference on Data Mining: Advances in Data Mining, Applications in E-Commerce, Medicine, and Knowledge Management
Mining Relevant Text from Unlabelled Documents

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Text categorization with many redundant features: using aggressive feature selection to make SVMs competitive with C4.5

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Learning similarity measures in non-orthogonal space

Proceedings of the thirteenth ACM international conference on Information and knowledge management
A selective sampling approach to active feature selection

Artificial Intelligence
Toward Integrating Feature Selection Algorithms for Classification and Clustering

IEEE Transactions on Knowledge and Data Engineering
A comprehensive comparative study on term weighting schemes for text categorization with support vector machines

WWW '05 Special interest tracks and posters of the 14th international conference on World Wide Web
Comparison of extreme learning machine with support vector machine for text classification

IEA/AIE'2005 Proceedings of the 18th international conference on Innovations in Applied Artificial Intelligence
Selecting text features for gene name classification: from documents to terms

BioMed '03 Proceedings of the ACL 2003 workshop on Natural language processing in biomedicine - Volume 13
A probabilistic model for text kernels

ICML '06 Proceedings of the 23rd international conference on Machine learning
Very sparse random projections

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Feature selection methods for text classification

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Raising the baseline for high-precision text classifiers

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Very sparse stable random projections for dimension reduction in lα (0

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
An algorithm to cluster data for efficient classification of support vector machines

Expert Systems with Applications: An International Journal
Random Forests for multiclass classification: Random MultiNomial Logit

Expert Systems with Applications: An International Journal
Text classification: a recent overview

ICCOMP'05 Proceedings of the 9th WSEAS International Conference on Computers
Support vector machines based Arabic language text classification system: feature selection comparative study

MATH'07 Proceedings of the 12th WSEAS International Conference on Applied Mathematics
Imbalanced text classification: A term weighting approach

Expert Systems with Applications: An International Journal
Text classification based on multi-word with support vector machine

Knowledge-Based Systems
Matrix representations, linear transformations, and kernels for disambiguation in natural language

Machine Learning
Class dependent feature scaling method using naive Bayes classifier for text datamining

Pattern Recognition Letters
McPAD: A multiple classifier system for accurate payload-based anomaly detection

Computer Networks: The International Journal of Computer and Telecommunications Networking
Review: A review of machine learning approaches to Spam filtering

Expert Systems with Applications: An International Journal
Proposing a new term weighting scheme for text categorization

AAAI'06 Proceedings of the 21st national conference on Artificial intelligence - Volume 1
Semantic classification with distributional kernels

COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
Identifying the Intent of a User Query Using Support Vector Machines

SPIRE '09 Proceedings of the 16th International Symposium on String Processing and Information Retrieval
Wikipedia-based semantic interpretation for natural language processing

Journal of Artificial Intelligence Research
Avoidance of model re-induction in SVM-based feature selection for text categorization

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Feature generation and representations for protein-protein interaction classification

Journal of Biomedical Informatics
Feature generation for text categorization using world knowledge

IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence
Improving binary classification on text problems using differential word features

Proceedings of the 18th ACM conference on Information and knowledge management
On strategies for imbalanced text classification using SVM: A comparative study

Decision Support Systems
Improved Online Support Vector Machines Spam Filtering Using String Kernels

CIARP '09 Proceedings of the 14th Iberoamerican Conference on Pattern Recognition: Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications
Towards automated assessment of engineering assignments

IJCNN'09 Proceedings of the 2009 international joint conference on Neural Networks
Using Kullback-Leibler distance for text categorization

ECIR'03 Proceedings of the 25th European conference on IR research
A simple probability based term weighting scheme for automated text classification

IEA/AIE'07 Proceedings of the 20th international conference on Industrial, engineering, and other applications of applied intelligent systems
Term weighting evaluation in bipartite partitioning for text clustering

AIRS'08 Proceedings of the 4th Asia information retrieval conference on Information retrieval technology
Analytical evaluation of term weighting schemes for text categorization

Pattern Recognition Letters
A study of spam filtering using support vector machines

Artificial Intelligence Review
Optimizing reservoir features in oil exploration management based on fusion of soft computing

Applied Soft Computing
Evidentiality for text trustworthiness detection

NLPLING '10 Proceedings of the 2010 Workshop on NLP and Linguistics: Finding the Common Ground
A comparative study of TF*IDF, LSI and multi-words for text classification

Expert Systems with Applications: An International Journal
Nearest-neighbor guided evaluation of data reliability and its applications

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
Fast text categorization using concise semantic analysis

Pattern Recognition Letters
Incrementally maintaining classification using an RDBMS

Proceedings of the VLDB Endowment
Class-dependent projection based method for text categorization

Pattern Recognition Letters
Text representation in multi-label classification: two new input representations

ICANNGA'11 Proceedings of the 10th international conference on Adaptive and natural computing algorithms - Volume Part II
Robust sense-based sentiment classification

WASSA '11 Proceedings of the 2nd Workshop on Computational Approaches to Subjectivity and Sentiment Analysis
Document representations for classification of short web-page descriptions

DaWaK'06 Proceedings of the 8th international conference on Data Warehousing and Knowledge Discovery
Harnessing WordNet senses for supervised sentiment classification

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Exploiting randomness for feature selection in multinomial logit: a CRM cross-sell application

ICDM'06 Proceedings of the 6th Industrial Conference on Data Mining conference on Advances in Data Mining: applications in Medicine, Web Mining, Marketing, Image and Signal Mining
Interactions between document representation and feature selection in text categorization

DEXA'06 Proceedings of the 17th international conference on Database and Expert Systems Applications
An information theoretic approach to sentiment polarity classification

Proceedings of the 2nd Joint WICOW/AIRWeb Workshop on Web Quality
An experience developing a semantic annotation system in a media group

NLDB'12 Proceedings of the 17th international conference on Applications of Natural Language Processing and Information Systems
Emotion tracking on blogs - a case study for bengali

IEA/AIE'12 Proceedings of the 25th international conference on Industrial Engineering and Other Applications of Applied Intelligent Systems: advanced research in applied artificial intelligence
A high performance centroid-based classification approach for language identification

Pattern Recognition Letters
A scalable approach to simultaneous evolutionary instance and feature selection

Information Sciences: an International Journal
Content Mining of Microblogs

ASONAM '12 Proceedings of the 2012 International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2012)
Projected-prototype based classifier for text categorization

Knowledge-Based Systems
Computer models for identifying instrumental citations in the biomedical literature

Scientometrics

Quantified Score

Hi-index	0.01

Visualization

Abstract

The choice of the kernel function is crucial to most applications of support vector machines. In this paper, however, we show that in the case of text classification, term-frequency transformations have a larger impact on the performance of SVM than the kernel itself. We discuss the role of importance-weights (e.g. document frequency and redundancy), which is not yet fully understood in the light of model complexity and calculation cost, and we show that time consuming lemmatization or stemming can be avoided even when classifying a highly inflectional language like German.