Measuring text similarity with dynamic time warping

Authors:
Michael Matuschek;Tim Schlüter;Stefan Conrad
Affiliations:
Heinrich-Heine-Universität, Düsseldorf, Germany;Heinrich-Heine-Universität, Düsseldorf, Germany;Heinrich-Heine-Universität, Düsseldorf, Germany
Venue:
IDEAS '08 Proceedings of the 2008 international symposium on Database engineering & applications
Year:
2008

Citing 11
Cited 2

Dynamic programming algorithm optimization for spoken word recognition

Readings in speech recognition
Scaling up dynamic time warping for datamining applications

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
A vector space model for automatic indexing

Communications of the ACM
Towards an error free plagarism detection process

Proceedings of the 6th annual conference on Innovation and technology in computer science education
Modern Information Retrieval

Modern Information Retrieval
Fast Similarity Search in the Presence of Noise, Scaling, and Translation in Time-Series Databases

VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
Calculating similarity between texts using graph-based text representation model

Proceedings of the thirteenth ACM international conference on Information and knowledge management
Sentence-based natural language plagiarism detection

Journal on Educational Resources in Computing (JERIC)
Plagiarism Detection through Multilevel Text Comparison

AXMEDIS '06 Proceedings of the Second International Conference on Automated Production of Cross Media Content for Multi-Channel Distribution
A natural language processing approach to automatic plagiarism detection

Proceedings of the 8th ACM SIGITE conference on Information technology education
Toward accurate dynamic time warping in linear time and space

Intelligent Data Analysis

Boundary-based lower-bound functions for dynamic time warping and their indexing

Information Sciences: an International Journal
Similarity search for time series based on efficient warping measure

DM-IKM '12 Proceedings of the Data Mining and Intelligent Knowledge Management Workshop

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this work, we describe an approach which aims to make typed texts comparable with temporal data mining methods. This proposal was made in earlier work [11], but to our knowledge no significant research on this subject has been done yet. The basic idea is to derive artificial time series from texts by counting the occurrences of relevant keywords in a sliding window applied to them, and these time series can be compared with techniques of time series analysis. In this particular case the Dynamic Time Warping distance [3] was used. By extensive testing adequate parameters for time series calculation were derived, and we show that this approach might aid in the recognition of similar texts since the observed distances between similar documents are significantly lower than those between unrelated texts. Our idea might also be especially suitable for comparison in different languages since only the keyword translations must be known.