SCA: phonetic alignment based on sound classes

Authors:
Johann-Mattis List
Affiliations:
Heinrich Heine University Düsseldorf, Germany
Venue:
ESSLLI'10 Proceedings of the 2010 international conference on New Directions in Logic, Language and Computation
Year:
2010

Citing 5
Cited 1

An algorithm to align words for historical comparison

Computational Linguistics
Algorithms on strings, trees, and sequences: computer science and computational biology

Algorithms on strings, trees, and sequences: computer science and computational biology
The String-to-String Correction Problem

Journal of the ACM (JACM)
Algorithms for language reconstruction

Algorithms for language reconstruction
Multiple sequence alignments in linguistics

LaTeCH-SHELT&R '09 Proceedings of the EACL 2009 Workshop on Language Technology and Resources for Cultural Heritage, Social Sciences, Humanities, and Education

LexStat: automatic detection of cognates in multilingual wordlists

EACL 2012 Proceedings of the EACL 2012 Joint Workshop of LINGVIS & UNCLH

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper I present the most recent version of the SCA method for pairwise and multiple alignment analyses. In contrast to previously proposed alignment methods, SCA is based on a novel framework of sequence alignment which combines new approaches to sequence modeling in historical linguistics with recent developments in computational biology. In contrast to earlier versions of SCA [1,2] the new version comes along with a couple of modifications that significantly improve the performance and the application range of the algorithm: A new sound class model was defined which works well on highly divergent sequences, the algorithm for pairwise alignment was modified to be sensitive to secondary sequence structures such as syllable boundaries, and an algorithm for the pre-processing of the data in multiple alignment analyses [3] was included to cope for the bias resulting from progressive alignment analyses. In order to test the method, a new gold standard for pairwise and multiple alignment analyses was created which consists of 45 947 sequences covering a total of 435 different taxa belonging to six different language families.