Machine Learning
Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
An introduction to ROC analysis
Pattern Recognition Letters - Special issue: ROC analysis in pattern recognition
ACM Computing Surveys (CSUR)
The WEKA data mining software: an update
ACM SIGKDD Explorations Newsletter
Evaluating the accuracy of data collection on mobile phones: a study of forms, sms, and voice
ICTD'09 Proceedings of the 3rd international conference on Information and communication technologies and development
Designing adaptive feedback for improving data entry accuracy
UIST '10 Proceedings of the 23nd annual ACM symposium on User interface software and technology
Usher: Improving Data Quality with Dynamic Forms
IEEE Transactions on Knowledge and Data Engineering
Improving community health worker performance through automated SMS
Proceedings of the Fifth International Conference on Information and Communication Technologies and Development
Open data kit: tools to build information services for developing regions
Proceedings of the 4th ACM/IEEE International Conference on Information and Communication Technologies and Development
Managing microfinance with paper, pen and digital slate
Proceedings of the 4th ACM/IEEE International Conference on Information and Communication Technologies and Development
Improving data collection and monitoring through real-time data analysis
Proceedings of the 3rd ACM Symposium on Computing for Development
Using behavioral data to identify interviewer fabrication in surveys
Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Towards operationalizing outlier detection in community health programs
Proceedings of the Sixth International Conference on Information and Communications Technologies and Development: Notes - Volume 2
Hi-index | 0.00 |
Systematic interviewer error is a potential issue in any health survey, and it can be especially pernicious in low- and middle-income countries, where survey teams may face problems of limited supervision, chaotic environments, language barriers, and low literacy. Survey teams in such environments could benefit from software that leverages mobile data collection tools to provide solutions for automated data quality control. As a first step in the creation of such software, we investigate and test several algorithms that find anomalous patterns in data. We validate the algorithms using one labeled data set and two unlabeled data sets from two community outreach programs in East Africa. In the labeled set, some of the data is known to be fabricated and some is believed to be relatively accurate. The unlabeled sets are from actual field operations. We demonstrate the feasibility of tools for automated data quality control by showing that the algorithms detect the fake data in the labeled set with a high sensitivity and specificity, and that they detect compelling anomalies in the unlabeled sets.