Pragmatic text mining: minimizing human effort to quantify many issues in call logs

Authors:
George Forman;Evan Kirshenbaum;Jaap Suermondt
Affiliations:
Hewlett-Packard Labs, Palo Alto, CA;Hewlett-Packard Labs, Palo Alto, CA;Hewlett-Packard Labs, Palo Alto, CA
Venue:
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Year:
2006

Citing 12
Cited 4

ThemeRiver: Visualizing Thematic Changes in Large Document Collections

IEEE Transactions on Visualization and Computer Graphics
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features

ECML '98 Proceedings of the 10th European Conference on Machine Learning
Frequent term-based text clustering

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
An extensive empirical study of feature selection metrics for text classification

The Journal of Machine Learning Research
Resource selection for domain-specific cross-lingual IR

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Diverse ensembles for active learning

ICML '04 Proceedings of the twenty-first international conference on Machine learning
A Response to Webb and Ting's On the Application of ROC Analysis to Predict Classification Performance Under Varying Class Distributions

Machine Learning
Learning from little: comparison of classifiers given little training

PKDD '04 Proceedings of the 8th European Conference on Principles and Practice of Knowledge Discovery in Databases
Discovering evolutionary theme patterns from text: an exploration of temporal text mining

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Model-based overlapping clustering

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Quantifying trends accurately despite classifier error and class imbalance

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Counting positives accurately despite inaccurate classification

ECML'05 Proceedings of the 16th European conference on Machine Learning

Quantifying trends accurately despite classifier error and class imbalance

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Quantifying counts and costs via classification

Data Mining and Knowledge Discovery
Hierarchical service analytics for improving productivity in an enterprise service center

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Class distribution estimation based on the Hellinger distance

Information Sciences: an International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

We discuss our experiences in analyzing customer-support issues from the unstructured free-text fields of technical-support call logs. The identification of frequent issues and their accurate quantification is essential in order to track aggregate costs broken down by issue type, to appropriately target engineering resources, and to provide the best diagnosis, support and documentation for most common issues. We present a new set of techniques for doing this efficiently on an industrial scale, without requiring manual coding of calls in the call center. Our approach involves (1) a new text clustering method to identify common and emerging issues; (2) a method to rapidly train large numbers of categorizers in a practical, interactive manner; and (3) a method to accurately quantify categories, even in the face of inaccurate classifications and training sets that necessarily cannot match the class distribution of each new month's data. We present our methodology and a tool we developed and deployed that uses these methods for tracking ongoing support issues and discovering emerging issues at HP.