Dynamic programming algorithm optimization for spoken word recognition
Readings in speech recognition
Scaling up dynamic time warping for datamining applications
Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
A vector space model for automatic indexing
Communications of the ACM
Towards an error free plagarism detection process
Proceedings of the 6th annual conference on Innovation and technology in computer science education
Modern Information Retrieval
Fast Similarity Search in the Presence of Noise, Scaling, and Translation in Time-Series Databases
VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
Calculating similarity between texts using graph-based text representation model
Proceedings of the thirteenth ACM international conference on Information and knowledge management
Sentence-based natural language plagiarism detection
Journal on Educational Resources in Computing (JERIC)
Plagiarism Detection through Multilevel Text Comparison
AXMEDIS '06 Proceedings of the Second International Conference on Automated Production of Cross Media Content for Multi-Channel Distribution
A natural language processing approach to automatic plagiarism detection
Proceedings of the 8th ACM SIGITE conference on Information technology education
Toward accurate dynamic time warping in linear time and space
Intelligent Data Analysis
Boundary-based lower-bound functions for dynamic time warping and their indexing
Information Sciences: an International Journal
Similarity search for time series based on efficient warping measure
DM-IKM '12 Proceedings of the Data Mining and Intelligent Knowledge Management Workshop
Hi-index | 0.00 |
In this work, we describe an approach which aims to make typed texts comparable with temporal data mining methods. This proposal was made in earlier work [11], but to our knowledge no significant research on this subject has been done yet. The basic idea is to derive artificial time series from texts by counting the occurrences of relevant keywords in a sliding window applied to them, and these time series can be compared with techniques of time series analysis. In this particular case the Dynamic Time Warping distance [3] was used. By extensive testing adequate parameters for time series calculation were derived, and we show that this approach might aid in the recognition of similar texts since the observed distances between similar documents are significantly lower than those between unrelated texts. Our idea might also be especially suitable for comparison in different languages since only the keyword translations must be known.