Likelihood ratio estimation in forensic identification using similarity and rarity

Authors:
Yi Tang;Sargur N. Srihari
Affiliations:
-;-
Venue:
Pattern Recognition
Year:
2014

Citing 13
Cited 0

On a relation between graph edit distance and maximum common subgraph

Pattern Recognition Letters
A graph distance metric based on the maximal common subgraph

Pattern Recognition Letters
The Earth Mover's Distance as a Metric for Image Retrieval

International Journal of Computer Vision
Unsupervised Learning of Finite Mixture Models

IEEE Transactions on Pattern Analysis and Machine Intelligence
An Information-Theoretic Definition of Similarity

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Pattern Recognition and Machine Learning (Information Science and Statistics)

Pattern Recognition and Machine Learning (Information Science and Statistics)
Likelihood Ratio-Based Biometric Score Fusion

IEEE Transactions on Pattern Analysis and Machine Intelligence
Probabilistic Graphical Models: Principles and Techniques - Adaptive Computation and Machine Learning

Probabilistic Graphical Models: Principles and Techniques - Adaptive Computation and Machine Learning
Similarity and Clustering of Footwear Prints

GRC '10 Proceedings of the 2010 IEEE International Conference on Granular Computing
All of Statistics: A Concise Course in Statistical Inference

All of Statistics: A Concise Course in Statistical Inference
Footwear print retrieval system for real crime scene marks

IWCF'10 Proceedings of the 4th international conference on Computational forensics
Evaluating the Rarity of Handwriting Formations

ICDAR '11 Proceedings of the 2011 International Conference on Document Analysis and Recognition
Graph matching – challenges and potential solutions

ICIAP'05 Proceedings of the 13th international conference on Image Analysis and Processing

Quantified Score

Hi-index	0.01

Visualization

Abstract

Forensic identification is the task of determining whether or not observed evidence arose from a known source. It is useful to associate probabilities with identification/exclusion opinions, either for presentation in court or to evaluate the discriminative power of a given set of attributes. At present, in most forensic domains outside of DNA evidence, it is not possible to make such a statement since the necessary probability distributions cannot be computed with reasonable accuracy, although the probabilistic approach itself is well-understood. In principle, it involves determining a likelihood ratio (LR) - the ratio of the joint probability of the evidence and source under the identification hypothesis (that the evidence came from the source) and under the exclusion hypothesis (that the evidence did not arise from the source). Evaluating the joint probability is computationally intractable when the number of variables is even moderately large. It is also statistically infeasible since the number of parameters to be determined from the data is exponential with the number of variables. An approximate method is to replace the joint probability by another probability: that of distance (or similarity) between evidence and object under the two hypotheses. While this reduces to linear complexity with the number of variables, it is an oversimplification leading to errors. We consider a third method which decomposes the LR into a product of two factors, one based on distance and the other on rarity. This result, which is exact for the univariate Gaussian case, has an intuitive appeal - forensic examiners assign higher importance to rare feature values in the evidence and low importance to common feature values. We generalize this approach to more complex data such as vectors and graphs, which makes LR estimation computationally tractable. Empirical evaluations of the three methods, done with several data types (continuous features, binary features, multinomial and graph) and several modalities (handwriting with binary features, handwriting with multinomial features and footwear impressions with continuous features), show that the distance and rarity method is significantly better than the distance only method.