Digital forensic text string searching: Improving information retrieval effectiveness by thematically clustering search results

  • Authors:
  • Nicole Lang Beebe;Jan Guynes Clark

  • Affiliations:
  • The University of Texas at San Antonio, Department of IS&TM, One UTSA Circle, San Antonio, TX 78249, United States;The University of Texas at San Antonio, Department of IS&TM, One UTSA Circle, San Antonio, TX 78249, United States

  • Venue:
  • Digital Investigation: The International Journal of Digital Forensics & Incident Response
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Current digital forensic text string search tools use match and/or indexing algorithms to search digital evidence at the physical level to locate specific text strings. They are designed to achieve 100% query recall (i.e. find all instances of the text strings). Given the nature of the data set, this leads to an extremely high incidence of hits that are not relevant to investigative objectives. Although Internet search engines suffer similarly, they employ ranking algorithms to present the search results in a more effective and efficient manner from the user's perspective. Current digital forensic text string search tools fail to group and/or order search hits in a manner that appreciably improves the investigator's ability to get to the relevant hits first (or at least more quickly). This research proposes and empirically tests the feasibility and utility of post-retrieval clustering of digital forensic text string search results - specifically by using Kohonen Self-Organizing Maps, a self-organizing neural network approach. This paper is presented as a work-in-progress. A working tool has been developed and experimentation has begun. Findings regarding the feasibility and utility of the proposed approach will be presented at DFRWS 2007, as well as suggestions for follow-on research.