Similarity Visualization for the Grouping of Forensic Speech Recordings

Authors:
Klara A. Weiand;Jos S. Bouten;Cor J. Veenman
Affiliations:
Intelligent Systems Lab, University of Amsterdam, Amsterdam, The Netherlands;Digital Technology & Biometrics Department, Netherlands Forensic Institute, The Hague, The Netherlands;Intelligent Systems Lab, University of Amsterdam, Amsterdam, The Netherlands and Digital Technology & Biometrics Department, Netherlands Forensic Institute, The Hague, The Netherlands
Venue:
IWCF '08 Proceedings of the 2nd international workshop on Computational Forensics
Year:
2008

Citing 2
Cited 1

Speaker recognition

Survey of the state of the art in human language technology
A tutorial on text-independent speaker verification

EURASIP Journal on Applied Signal Processing

Computational Forensics: An Overview

IWCF '08 Proceedings of the 2nd international workshop on Computational Forensics

Quantified Score

Hi-index	0.01

Visualization

Abstract

In a forensic phone wiretapping investigation, a major problem is to get the full picture of the speakers involved. Typically, the wiretapped speech recordings are grouped using a clustering tool. The main disadvantage of such an approach is that in a bootstrapped scenario grouping errors accumulate. In this paper, we propose a visual approach to find similar speech recordings that probably stem from the same speaker. We first model the speech recordings and define suitable similarity measures between recordings. Then, through an approximate 2-D visualization of the inter-speech, similarities the investigator can identify clear groups of recordings and recordings that are harder to differentiate. We did extensive experiments on phone data of 50 speakers with 2 recordings per speaker. We tested quality of the 2-D visualization in relation to original high dimensional similarities. It turned out that for the original high dimensional similarity measure the nearest recording is almost always the one from the same speaker. In the 2-D visualization, we achieved that on average for all speech recordings a recording of the same speaker is among the 10 nearest recordings.