Applying machine learning in accounting research

Authors:
Machteld Van den Bogaerd;Walter Aerts
Affiliations:
Department of Accounting and Finance, University of Antwerp, Faculty of Applied Economics, Prinsstraat 13, B-2000 Antwerp, Belgium;Department of Accounting and Finance, University of Antwerp, Faculty of Applied Economics, Prinsstraat 13, B-2000 Antwerp, Belgium
Venue:
Expert Systems with Applications: An International Journal
Year:
2011

Citing 7
Cited 0

Text Classification from Labeled and Unlabeled Documents using EM

Machine Learning - Special issue on information retrieval
The Essence of Artificial Intelligence

The Essence of Artificial Intelligence
Partially Supervised Classification of Text Documents

ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Building Text Classifiers Using Positive and Unlabeled Examples

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
On the use of linear programming for unsupervised text classification

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Reputation, Diversification, and Organizational Explanations of Performance in Professional Service Firms

Organization Science
Learning to classify texts using positive and unlabeled data

IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence

Quantified Score

Hi-index	12.05

Visualization

Abstract

Quite often, in order to derive meaningful insights, accounting researchers have to analyze large bodies of text. Usually, this is done manually by several human coders, which makes the process time consuming, expensive, and often neither replicable nor accurate. In an attempt to mitigate these problems, we perform a feasibility study investigating the applicability of computer-aided content analysis techniques onto the domain of accounting research. Krippendorff (1980) defines an algorithm's reliability as its stability, reproducibility and accuracy. Since in computer-aided text classification, which is inherently objective and repeatable, the first two requirements, stability and reproducibility, are not an issue, this paper focuses exclusively on the third requirement, the algorithm's accuracy. It is important to note that, although inaccurate classification results are completely worthless, it is surprising to see how few research papers actually mention the accuracy of the used classification methodology. After a survey of the available techniques, we perform an in depth analysis of the most promising one, LPU (Learning from Positive and Unlabelled), which turns out to have an F-value and accuracy of about 90%, which means that, given a random text, it has a 90% probability of classifying it correctly.