Recognition of Noisy Subsequences Using Constrained Edit Distances
IEEE Transactions on Pattern Analysis and Machine Intelligence
Approximate string-matching with q-grams and maximal matches
Theoretical Computer Science - Selected papers of the Combinatorial Pattern Matching School
A fast string searching algorithm
Communications of the ACM
A New Indexing Method for Approximate Search in Text Databases
CIT '05 Proceedings of the The Fifth International Conference on Computer and Information Technology
Fast Approximate Search in Large Dictionaries
Computational Linguistics
Improving the Efficiency of Digital Forensic Search by Means of the Constrained Edit Distance
IAS '07 Proceedings of the Third International Symposium on Information Assurance and Security
Fast Selection of Small and Precise Candidate Sets from Dictionaries for Text Correction Tasks
ICDAR '07 Proceedings of the Ninth International Conference on Document Analysis and Recognition - Volume 01
The Noisy Substring Matching Problem
IEEE Transactions on Software Engineering
Hi-index | 0.00 |
In order to find evidence, digital forensic investigation often includes search procedures applied on large data sets. For such search procedures, appropriate fault tolerant distance measures are needed in order to detect evidence even if it has been previously distorted/partially erased from the search media. One of the appropriate fault-tolerant distance measures for this purpose is constrained edit distance, where the maximum numbers of consecutive insertions and deletions represent the constraints. However, the time complexity of its computation is too high. We propose a two-phase indexless search procedure for application in forensic evidence search that makes use of q-gram distance instead of the constrained edit distance. The q-gram distance is known to approximate well the unconstrainededit distance. We study how well q-gram distance approximates edit distance with special constraints needed in forensic search applications. We compare the performances of the search procedure with the two distances applied in it. Experimental results show that the procedure with the q-gram distance implemented achieves for some values of qalmost the same accuracy as the one with the constrained edit distance, but the efficiency of the procedure that implements the q-gram distance is much better, for a much lower time complexity of computation of the q-gram distance.