MEDITE: a unilingual textual aligner

Authors:
Julien Bourdaillet;Jean-Gabriel Ganascia
Affiliations:
Université Pierre et Marie Curie – Laboratoire d’Informatique de Paris 6, Paris, France;Université Pierre et Marie Curie – Laboratoire d’Informatique de Paris 6, Paris, France
Venue:
FinTAL'06 Proceedings of the 5th international conference on Advances in Natural Language Processing
Year:
2006

Citing 7
Cited 1

Block edit models for approximate string matching

Theoretical Computer Science - Special issue: Latin American theoretical informatics
Algorithms on strings, trees, and sequences: computer science and computational biology

Algorithms on strings, trees, and sequences: computer science and computational biology
Foundations of statistical natural language processing

Foundations of statistical natural language processing
The string-to-string correction problem with block moves

ACM Transactions on Computer Systems (TOCS)
Edit Distance with Move Operations

CPM '02 Proceedings of the 13th Annual Symposium on Combinatorial Pattern Matching
BLANC: learning evaluation metrics for MT

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
The greedy algorithm for edit distance with moves

Information Processing Letters

Allongos: longitudinal alignment for the genetic study of writers' drafts

CICLing'13 Proceedings of the 14th international conference on Computational Linguistics and Intelligent Text Processing - Volume 2

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper addresses a problem of natural language text alignment, from a humanities discipline called textual genetic criticism where different text versions must be compared. The paper shows that this task is hard because such versions can be very different and texts with a lot of internal repetitions present specific difficulties. MEDITE is a natural language text aligner that compares texts written in the same language. It detects modifications at character level, as opposed to related applications which either remain at word level or give poor results at character level. The detection of moved blocks in the text, induced by our formalism based on edit distance with moves, is introduced. The algorithm is closely related to sequence alignment in bioinformatics as similar building blocks are used and applied to this natural language processing task. A benchmark analysis has been carried out to compare MEDITE with other aligners and it shows that our approach is superior to existing ones especially in hard cases.