Adding numbers to text classification

Authors:
Sofus A. Macskassy;Haym Hirsh
Affiliations:
New York University, NY;Rutgers University, NJ
Venue:
CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
Year:
2003

Citing 17
Cited 0

Neural network learning and expert systems

Neural network learning and expert systems
On changing continuous attributes into ordered discrete attributes

EWSL-91 Proceedings of the European working session on learning on Machine learning
The nature of statistical learning theory

The nature of statistical learning theory
Boosting and Rocchio applied to text filtering

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Making large-scale support vector machine learning practical

Advances in kernel methods
A simple, fast, and effective rule learner

AAAI '99/IAAI '99 Proceedings of the sixteenth national conference on Artificial intelligence and the eleventh Innovative applications of artificial intelligence conference innovative applications of artificial intelligence
Machine learning in automated text categorization

ACM Computing Surveys (CSUR)
Machine Learning

Machine Learning
Brains, Behavior and Robotics

Brains, Behavior and Robotics
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Converting numerical classification into text classification

Artificial Intelligence
A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Making Better Use of Global Discretization

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Using text classifiers for numerical classification

IJCAI'01 Proceedings of the 17th international joint conference on Artificial intelligence - Volume 2
A study of cross-validation and bootstrap for accuracy estimation and model selection

IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 2
Discretizing continuous attributes in AdaBoost for text categorization

ECIR'03 Proceedings of the 25th European conference on IR research
Learning trees and rules with set-valued features

AAAI'96 Proceedings of the thirteenth national conference on Artificial intelligence - Volume 1

Quantified Score

Hi-index	0.00

Visualization

Abstract

Many real-world problems involve a combination of both text- and numerical-valued features. For example, in email classification, it is possible to use instance representations that consider not only the text of each message, but also numerical-valued features such as the length of the message or the time of day at which it was sent. Text-classification methods have thus far not easily incorporated numerical features. In earlier work we described an approach for converting numerical features into bags of tokens so that text classification methods can be applied to numerical classification problems, and showed that the resulting learning methods are competitive with traditional numerical classification methods. In this paper we use this as a way to learn on problems that involve a combination of text and numbers. We show that the results outperform competing methods. Further, we show that selecting a best classification method using text-only features and then adding numerical features to the problem (as might happen if numerical features are only later added to a pre existing text-classification problem) gives performance that rivals a more time-consuming approach of evaluating all classification methods using the full set of both text and numerical features.