Automatic text processing: the transformation, analysis, and retrieval of information by computer
Automatic text processing: the transformation, analysis, and retrieval of information by computer
Scatter/Gather: a cluster-based approach to browsing large document collections
SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
Fast and effective text mining using linear-time document clustering
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Probabilistic latent semantic indexing
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Concept decompositions for large sparse text data using clustering
Machine Learning
Constrained K-means Clustering with Background Knowledge
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
The Journal of Machine Learning Research
The Journal of Machine Learning Research
A probabilistic framework for semi-supervised clustering
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Integrating constraints and metric learning in semi-supervised clustering
ICML '04 Proceedings of the twenty-first international conference on Machine learning
Organizing the OCA: learning faceted subjects from a library of digital books
Proceedings of the 7th ACM/IEEE-CS joint conference on Digital libraries
Proceedings of the 17th international conference on World Wide Web
Comparing LDA with pLSI as a dimensionality reduction method in document clustering
LKR'08 Proceedings of the 3rd international conference on Large-scale knowledge resources: construction and application
Agglomerative hierarchical clustering with constraints: theoretical and empirical results
PKDD'05 Proceedings of the 9th European conference on Principles and Practice of Knowledge Discovery in Databases
Hi-index | 0.00 |
As the promotion of China's Golden Auditing Project and the fast growth of on-line auditing, there are thousands of new computer audit methods emerged every year to fulfill various needs of audit practices. How to organize these existing computer audit methods and use them intelligently have become a fundamental and challenging problem. In this paper, we propose to use topic-driven clustering methods to organize computer audit methods according to the system of computer audit methods that is issued by the National Audit Office of China. We also apply Latent Dirichlet allocation (LDA) analysis to audit method datasets at different levels of granularity. Our experimental results on social insurance computer audit methods show that the topic-driven clustering scheme with topics created by domain experts is the overall best scheme. It achieved an average purity of 0.862 across the datasets. Topics discovered by LDA were consistent with classes defined in the taxonomy for four out of five datasets, and they were effective when used in the topic-driven clustering scheme.