Textual data mining for industrial knowledge management and text classification: A business oriented approach

Authors:
N. Ur-Rahman;J. A. Harding
Affiliations:
-;-
Venue:
Expert Systems with Applications: An International Journal
Year:
2012

Citing 15
Cited 4

Mining association rules between sets of items in large databases

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
The nature of statistical learning theory

The nature of statistical learning theory
Data mining: practical machine learning tools and techniques with Java implementations

Data mining: practical machine learning tools and techniques with Java implementations
An introduction to support Vector Machines: and other kernel-based learning methods

An introduction to support Vector Machines: and other kernel-based learning methods
Data mining: concepts and techniques

Data mining: concepts and techniques
C4.5: Programs for Machine Learning

C4.5: Programs for Machine Learning
An incremental learning algorithm for constructing boolean functions from positive and negative examples

Computers and Operations Research
Discovering Knowledge in Data: An Introduction to Data Mining

Discovering Knowledge in Data: An Introduction to Data Mining
Text analysis and knowledge mining system

IBM Systems Journal
Scanning world wide web documents with the vector space model

Decision Support Systems
Imbalanced text classification: A term weighting approach

Expert Systems with Applications: An International Journal
Rough set based hybrid algorithm for text classification

Expert Systems with Applications: An International Journal
The needs and benefits of Text Mining applications on Post-Project Reviews

Computers in Industry
Text mining with application to engineering diagnostics

IEA/AIE'06 Proceedings of the 19th international conference on Advances in Applied Artificial Intelligence: industrial, Engineering and Other Applications of Applied Intelligent Systems
Clustering and classification of maintenance logs using text data mining

AusDM '08 Proceedings of the 7th Australasian Data Mining Conference - Volume 87

Improving user experience with case-based reasoning systems using text mining and Web 2.0

Expert Systems with Applications: An International Journal
A hybrid OLAP-association rule mining based quality management system for extracting defect patterns in the garment industry

Expert Systems with Applications: An International Journal
Effective data warehouse for information delivery: a literature survey and classification

International Journal of Networking and Virtual Organisations
Knowledge discovery in inspection reports of marine structures

Expert Systems with Applications: An International Journal

Quantified Score

Hi-index	12.06

Visualization

Abstract

Textual databases are useful sources of information and knowledge and if these are well utilised then issues related to future project management and product or service quality improvement may be resolved. A large part of corporate information, approximately 80%, is available in textual data formats. Text Classification techniques are well known for managing on-line sources of digital documents. The identification of key issues discussed within textual data and their classification into two different classes could help decision makers or knowledge workers to manage their future activities better. This research is relevant for most text based documents and is demonstrated on Post Project Reviews (PPRs) which are valuable source of information and knowledge. The application of textual data mining techniques for discovering useful knowledge and classifying textual data into different classes is a relatively new area of research. The research work presented in this paper is focused on the use of hybrid applications of text mining or textual data mining techniques to classify textual data into two different classes. The research applies clustering techniques at the first stage and Apriori Association Rule Mining at the second stage. The Apriori Association Rule of Mining is applied to generate Multiple Key Term Phrasal Knowledge Sequences (MKTPKS) which are later used for classification. Additionally, studies were made to improve the classification accuracies of the classifiers i.e. C4.5, K-NN, Naive Bayes and Support Vector Machines (SVMs). The classification accuracies were measured and the results compared with those of a single term based classification model. The methodology proposed could be used to analyse any free formatted textual data and in the current research it has been demonstrated on an industrial dataset consisting of Post Project Reviews (PPRs) collected from the construction industry. The data or information available in these reviews is codified in multiple different formats but in the current research scenario only free formatted text documents are examined. Experiments showed that the performance of classifiers improved through adopting the proposed methodology.