String distance metrics for reference matching and search query correction

Authors:
Jakub Piskorski;Marcin Sydow
Affiliations:
Joint Research Center of the European Commission, Web and Language Technology Group of IPSC, Ispra, VA, Italy;Polish-Japanese Institute of Information Technology, Department of Intelligent Systems, Warsaw, Poland
Venue:
BIS'07 Proceedings of the 10th international conference on Business information systems
Year:
2007

Citing 6
Cited 3

Fast parallel and serial approximate string matching

Journal of Algorithms
Approximate string-matching with q-grams and maximal matches

Theoretical Computer Science - Selected papers of the Combinatorial Pattern Matching School
Techniques for automatically correcting words in text (abstract)

CSC '93 Proceedings of the 1993 ACM conference on Computer science
Coreference for NLP applications

ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
Multilingual modeling of cross-lingual spelling variants

Information Retrieval
Named-entity recognition for polish with SProUT

IMTCI'04 Proceedings of the Second international conference on Intelligent Media Technology for Communicative Intelligence

Mining Query Logs: Turning Search Usage Data into Knowledge

Foundations and Trends in Information Retrieval
Automated country name disambiguation for code set alignment

ECDL'10 Proceedings of the 14th European conference on Research and advanced technology for digital libraries
Introducing diversity to log-based query suggestions to deal with underspecified user queries

SIIS'11 Proceedings of the 2011 international conference on Security and Intelligent Information Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

String distance metrics have been widely used in various applications concerning processing of textual data. This paper reports on the exploration of their usability for tackling the reference matching task and for the automatic correction of misspelled search engine queries, in the context of highly inflective languages, in particular focusing on Polish. The results of numerous experiments in different scenarios are presented and they revealed some preferred metrics. Surprisingly good results were observed for correcting misspelled search engine queries. Nevertheless, a more in-depth analysis is necessary to achieve improvements. The work reported here constitutes a good point of departure for further research on this topic.