Corpus building for corporate knowledge discovery and management: a case study of manufacturing

Authors:
Ying Liu;Han Tong Loh
Affiliations:
Department of Industrial and Systems Engineering, The Hong Kong Polytechnic University, Kowloon, Hong Kong SAR, China;Department of Mechanical Engineering, National University of Singapore, Singapore
Venue:
KES'07/WIRN'07 Proceedings of the 11th international conference, KES 2007 and XVII Italian workshop on neural networks conference on Knowledge-based intelligent information and engineering systems: Part I
Year:
2007

Citing 10
Cited 3

OHSUMED: an interactive retrieval evaluation and new large test collection for research

SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
The nature of statistical learning theory

The nature of statistical learning theory
From data mining to knowledge discovery: an overview

Advances in knowledge discovery and data mining
A re-examination of text categorization methods

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Machine learning and data mining

Communications of the ACM
Machine learning in automated text categorization

ACM Computing Surveys (CSUR)
Modern Information Retrieval

Modern Information Retrieval
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features

ECML '98 Proceedings of the 10th European Conference on Machine Learning
RCV1: A New Benchmark Collection for Text Categorization Research

The Journal of Machine Learning Research
Untangling text data mining

ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics

Imbalanced text classification: A term weighting approach

Expert Systems with Applications: An International Journal
Learning the "Whys": Discovering design rationale using text mining - An algorithm perspective

Computer-Aided Design
Identifying helpful online reviews: A product designer's perspective

Computer-Aided Design

Quantified Score

Hi-index	0.00

Visualization

Abstract

Building a collection of electronic documents, e.g. corpus, is a cornerstone for the research in information retrieval, text mining and knowledge management. In literature, very few papers have discussed the necessary concerns for building a corpus and explained the building process systematically. In this paper, we explain our work of building an enterprise corpus called manufacturing corpus version 1 (MCV1) for corporate knowledge management purpose. Relevant issues, e.g. input texts, category labels and policies, as well as its parallel coding process and quality measurements are discussed. The real-world automated text classification experiments based on MCV1 show the soundness of its coding process. Finally, suggestions are made on how the proposed approach can be implemented in a more economical manner.