Likelihood ratio estimation in forensic identification using similarity and rarity

  • Authors:
  • Yi Tang;Sargur N. Srihari

  • Affiliations:
  • -;-

  • Venue:
  • Pattern Recognition
  • Year:
  • 2014

Quantified Score

Hi-index 0.01

Visualization

Abstract

Forensic identification is the task of determining whether or not observed evidence arose from a known source. It is useful to associate probabilities with identification/exclusion opinions, either for presentation in court or to evaluate the discriminative power of a given set of attributes. At present, in most forensic domains outside of DNA evidence, it is not possible to make such a statement since the necessary probability distributions cannot be computed with reasonable accuracy, although the probabilistic approach itself is well-understood. In principle, it involves determining a likelihood ratio (LR) - the ratio of the joint probability of the evidence and source under the identification hypothesis (that the evidence came from the source) and under the exclusion hypothesis (that the evidence did not arise from the source). Evaluating the joint probability is computationally intractable when the number of variables is even moderately large. It is also statistically infeasible since the number of parameters to be determined from the data is exponential with the number of variables. An approximate method is to replace the joint probability by another probability: that of distance (or similarity) between evidence and object under the two hypotheses. While this reduces to linear complexity with the number of variables, it is an oversimplification leading to errors. We consider a third method which decomposes the LR into a product of two factors, one based on distance and the other on rarity. This result, which is exact for the univariate Gaussian case, has an intuitive appeal - forensic examiners assign higher importance to rare feature values in the evidence and low importance to common feature values. We generalize this approach to more complex data such as vectors and graphs, which makes LR estimation computationally tractable. Empirical evaluations of the three methods, done with several data types (continuous features, binary features, multinomial and graph) and several modalities (handwriting with binary features, handwriting with multinomial features and footwear impressions with continuous features), show that the distance and rarity method is significantly better than the distance only method.