ThemeRiver: Visualizing Thematic Changes in Large Document Collections
IEEE Transactions on Visualization and Computer Graphics
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features
ECML '98 Proceedings of the 10th European Conference on Machine Learning
Frequent term-based text clustering
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
An extensive empirical study of feature selection metrics for text classification
The Journal of Machine Learning Research
Resource selection for domain-specific cross-lingual IR
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Diverse ensembles for active learning
ICML '04 Proceedings of the twenty-first international conference on Machine learning
Learning from little: comparison of classifiers given little training
PKDD '04 Proceedings of the 8th European Conference on Principles and Practice of Knowledge Discovery in Databases
Discovering evolutionary theme patterns from text: an exploration of temporal text mining
Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Model-based overlapping clustering
Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Quantifying trends accurately despite classifier error and class imbalance
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Counting positives accurately despite inaccurate classification
ECML'05 Proceedings of the 16th European conference on Machine Learning
Quantifying trends accurately despite classifier error and class imbalance
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Quantifying counts and costs via classification
Data Mining and Knowledge Discovery
Hierarchical service analytics for improving productivity in an enterprise service center
CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Class distribution estimation based on the Hellinger distance
Information Sciences: an International Journal
Hi-index | 0.00 |
We discuss our experiences in analyzing customer-support issues from the unstructured free-text fields of technical-support call logs. The identification of frequent issues and their accurate quantification is essential in order to track aggregate costs broken down by issue type, to appropriately target engineering resources, and to provide the best diagnosis, support and documentation for most common issues. We present a new set of techniques for doing this efficiently on an industrial scale, without requiring manual coding of calls in the call center. Our approach involves (1) a new text clustering method to identify common and emerging issues; (2) a method to rapidly train large numbers of categorizers in a practical, interactive manner; and (3) a method to accurately quantify categories, even in the face of inaccurate classifications and training sets that necessarily cannot match the class distribution of each new month's data. We present our methodology and a tool we developed and deployed that uses these methods for tracking ongoing support issues and discovering emerging issues at HP.