InfoAnalyzer: a computer-aided tool for building enterprise taxonomies

Authors:
Li Zhang;ShiXia Liu;Yue Pan;LiPing Yang
Affiliations:
IBM China Research Laboratory, Beijing, P.R. China;IBM China Research Laboratory, Beijing, P.R. China;IBM China Research Laboratory, Beijing, P.R. China;IBM China Research Laboratory, Beijing, P.R. China
Venue:
Proceedings of the thirteenth ACM international conference on Information and knowledge management
Year:
2004

Citing 10
Cited 3

Query by committee

COLT '92 Proceedings of the fifth annual workshop on Computational learning theory
Pivoted document length normalization

SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Selective Sampling Using the Query by Committee Algorithm

Machine Learning
The anatomy of a large-scale hypertextual Web search engine

WWW7 Proceedings of the seventh international conference on World Wide Web 7
On the merits of building categorization systems by supervised clustering

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
A re-examination of text categorization methods

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Topic detection and tracking in English and Chinese

IRAL '00 Proceedings of the fifth international workshop on on Information retrieval with Asian languages
Unsupervised and supervised clustering for topic tracking

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Maximizing Text-Mining Performance

IEEE Intelligent Systems
Employing EM and Pool-Based Active Learning for Text Classification

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning

Taxonomies by the numbers: building high-performance taxonomies

Proceedings of the 14th ACM international conference on Information and knowledge management
Topic taxonomy adaptation for group profiling

ACM Transactions on Knowledge Discovery from Data (TKDD)
Category hierarchy maintenance: a data-driven approach

SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we study the problem of collecting training samples for building enterprise taxonomies. We develop a computer-aided tool named InfoAnalyzer, which can effectively assist the enterprise to prepare large set of samples used for machine learning in text categorization. In our system, the enterprise category tree is initially defined by some keywords, then the Google search engine is used to construct a small set of labeled documents, and topic tracking algorithm based on document length normalization is applied to enlarge the training corpus on the bases of the seed stories. Furthermore, we design a method to check the consistency of the training corpus. Experiments show that the training corpus is good enough for statistical classification methods and meets human's requirements as well.