Text classification using a few labeled examples

Authors:
Francesco Colace;Massimo De Santo;Luca Greco;Paolo Napoletano
Affiliations:
-;-;-;-
Venue:
Computers in Human Behavior
Year:
2014

Citing 15
Cited 0

Support-Vector Networks

Machine Learning
Selection of relevant features and examples in machine learning

Artificial Intelligence - Special issue on relevance
Mining interesting knowledge using DM-II

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
A re-examination of text categorization methods

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Machine learning in automated text categorization

ACM Computing Surveys (CSUR)
Induction of Decision Trees

Machine Learning
RCV1: A New Benchmark Collection for Text Categorization Research

The Journal of Machine Learning Research
Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data (Data-Centric Systems and Applications)

Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data (Data-Centric Systems and Applications)
Text classification from unlabeled documents with bootstrapping and feature projection techniques

Information Processing and Management: an International Journal
A machine learning approach to building domain-specific search engines

IJCAI'99 Proceedings of the 16th international joint conference on Artificial intelligence - Volume 2
A framework for discovering and classifying ubiquitous services in digital health ecosystems

Journal of Computer and System Sciences
Feature-based opinion mining and ranking

Journal of Computer and System Sciences
Task-specific information retrieval systems for software engineers

Journal of Computer and System Sciences
Foreword: Information Retrieval, Decision Making Process and User Needs

Journal of Computer and System Sciences
Text Classification Using a Graph of Terms

CISIS '12 Proceedings of the 2012 Sixth International Conference on Complex, Intelligent, and Software Intensive Systems (CISIS)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Supervised text classifiers need to learn from many labeled examples to achieve a high accuracy. However, in a real context, sufficient labeled examples are not always available because human labeling is enormously time-consuming. For this reason, there has been recent interest in methods that are capable of obtaining a high accuracy when the size of the training set is small. In this paper we introduce a new single label text classification method that performs better than baseline methods when the number of labeled examples is small. Differently from most of the existing methods that usually make use of a vector of features composed of weighted words, the proposed approach uses a structured vector of features, composed of weighted pairs of words. The proposed vector of features is automatically learned, given a set of documents, using a global method for term extraction based on the Latent Dirichlet Allocation implemented as the Probabilistic Topic Model. Experiments performed using a small percentage of the original training set (about 1%) confirmed our theories.