An Experimental Comparison of a Document Deception Detection Policy using Real and Artificial Deception

Authors:
Yanjuan Yang;Michael Mannino
Affiliations:
Automapath, Inc.;University of Colorado Denver
Venue:
Journal of Data and Information Quality (JDIQ)
Year:
2012

Citing 17
Cited 1

An experimental evaluation of the assumption of independence in multiversion programming

IEEE Transactions on Software Engineering
Learning in the presence of malicious errors

SIAM Journal on Computing
Information Retrieval

Information Retrieval
Learning From Noisy Examples

Machine Learning
An Exploratory Study into Deception Detection in Text-Based Computer-Mediated Communication

HICSS '03 Proceedings of the 36th Annual Hawaii International Conference on System Sciences (HICSS'03) - Track1 - Volume 1
Synthesizing Test Data for Fraud Detection Systems

ACSAC '03 Proceedings of the 19th Annual Computer Security Applications Conference
Deception Detection under Varying Electronic Media and Warning Conditions

HICSS '04 Proceedings of the Proceedings of the 37th Annual Hawaii International Conference on System Sciences (HICSS'04) - Track 1 - Volume 1
Class Noise vs. Attribute Noise: A Quantitative Study

Artificial Intelligence Review
Heuristics and Modalities in Determining Truth Versus Deception

HICSS '05 Proceedings of the Proceedings of the 38th Annual Hawaii International Conference on System Sciences (HICSS'05) - Track 1 - Volume 01
StrikeCOM: A Multi-Player Online Strategy Game for Researching and Teaching Group Dynamics

HICSS '05 Proceedings of the Proceedings of the 38th Annual Hawaii International Conference on System Sciences (HICSS'05) - Track 1 - Volume 01
Lying on the Web: Implications for Expert Systems Redesign

Information Systems Research
Statistical Comparisons of Classifiers over Multiple Data Sets

The Journal of Machine Learning Research
A Comparison of Classification Methods for Predicting Deception in Computer-Mediated Communication

Journal of Management Information Systems
A Statistical Language Modeling Approach to Online Deception Detection

IEEE Transactions on Knowledge and Data Engineering
Classification algorithm sensitivity to training data with non representative attribute noise

Decision Support Systems
Detecting deception through linguistic analysis

ISI'03 Proceedings of the 1st NSF/NIJ conference on Intelligence and security informatics
An experimental comparison of real and artificial deception using a deception generation model

Decision Support Systems

An experimental comparison of real and artificial deception using a deception generation model

Decision Support Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Developing policies to screen documents for deception is often hampered by the cost of data collection and the inability to evaluate policy alternatives due to lack of data. To lower data collection costs and increase the amount of data, artificially generated deception data can be used, but the impact of using artificially generated deception data is not well understood. This article studies the impact of artificially generated deception on document screening policies. The deception and truth data were collected from financial aid applications, a document-centric area with limited resources for screening. Real deception was augmented with artificial data generated by noise and deception generation models. Using the real data and artificially generated data, we designed an innovative experiment with deception type and deception rate as factors, and harmonic mean and cost as outcome variables. We used two budget models (fixed and variable) typically employed by financial aid offices to measure the cost of noncompliance in financial aid applications. The analysis included an evaluation of a common policy for deception screening using both fixed and varying screening rates. The results of the experiment provided evidence of similar performance of screening policy with real and artificial deception, suggesting the possibility of using artificially generated deception to reduce the costs associated with obtaining training data.