Automated generation of model cases for help-desk applications

Authors:
S. M. Weiss;C. V. Apte
Affiliations:
IBM Research Division, Thomas J. Watson Research Center, Yorktown Heights, New York;IBM Research Division, Thomas J. Watson Research Center, Yorktown Heights, New York
Venue:
IBM Systems Journal
Year:
2002

Citing 7
Cited 1

Implementing agglomerative hierarchic clustering algorithms for use in document retrieval

Information Processing and Management: an International Journal
Scatter/Gather: a cluster-based approach to browsing large document collections

SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
Term-weighting approaches in automatic text retrieval

Readings in information retrieval
Using interdocument similarity information in document retrieval systems

Readings in information retrieval
Fast and effective text mining using linear-time document clustering

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Efficient clustering of high-dimensional data sets with application to reference matching

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Lightweight Document Matching for Help-Desk Applications

IEEE Intelligent Systems

Using text classification and multiple concepts to answer e-mails

Expert Systems with Applications: An International Journal

Quantified Score

Hi-index	0.01

Visualization

Abstract

Document databases may be ill-formed, containing redundant and poorly organized documents. For example, a database of customers' descriptions of problems with products and the vendor's descriptions of their resolution may contain many descriptions of the same problem. A highly desirable goal is to transform the database into a concise set of summarized reports-- model cases--which in turn are more amenable to search and problem resolution without expert intervention. In this paper, we describe techniques for attempting to automate the procedures for reducing a database to its essential components. Our initial application is self help for resolution of product problems. A lightweight document clustering method is described that operates in high dimensionality, processing tens of thousands of documents and grouping them into several thousand clusters. Techniques are described for summarization and exemplar selection to further refine the database contents. The method has been evaluated on a database of over 100000 customer-service problem reports that are reduced to 3000 clusters and 5000 exemplar documents. Preliminary results are promising and demonstrate efficient clustering performance with excellent group similarity measures, reducing the original database size by several orders of magnitude.