Topic discovery and topic-driven clustering for audit method datasets

  • Authors:
  • Ying Zhao;Wanyu Fu;Shaobin Huang

  • Affiliations:
  • Department of Computer Science and Technology, Tsinghua University, Beijing, China;Department of Computer Science and Technology, Tsinghua University, Beijing, China;College of Computer Science and Technology, Harbin Engineering University, Harbin, China

  • Venue:
  • ADMA'11 Proceedings of the 7th international conference on Advanced Data Mining and Applications - Volume Part II
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

As the promotion of China's Golden Auditing Project and the fast growth of on-line auditing, there are thousands of new computer audit methods emerged every year to fulfill various needs of audit practices. How to organize these existing computer audit methods and use them intelligently have become a fundamental and challenging problem. In this paper, we propose to use topic-driven clustering methods to organize computer audit methods according to the system of computer audit methods that is issued by the National Audit Office of China. We also apply Latent Dirichlet allocation (LDA) analysis to audit method datasets at different levels of granularity. Our experimental results on social insurance computer audit methods show that the topic-driven clustering scheme with topics created by domain experts is the overall best scheme. It achieved an average purity of 0.862 across the datasets. Topics discovered by LDA were consistent with classes defined in the taxonomy for four out of five datasets, and they were effective when used in the topic-driven clustering scheme.