Can corpus based measures be used for comparative study of languages?

  • Authors:
  • Anil Kumar Singh;Harshit Surana

  • Affiliations:
  • Int'l Inst. of Information Tech., Hyderabad, India;Int'l Inst. of Information Tech., Hyderabad, India

  • Venue:
  • SigMorPhon '07 Proceedings of Ninth Meeting of the ACL Special Interest Group in Computational Morphology and Phonology
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Quantitative measurement of inter-language distance is a useful technique for studying diachronic and synchronic relations between languages. Such measures have been used successfully for purposes like deriving language taxonomies and language reconstruction, but they have mostly been applied to handcrafted word lists. Can we instead use corpus based measures for comparative study of languages? In this paper we try to answer this question. We use three corpus based measures and present the results obtained from them and show how these results relate to linguistic and historical knowledge. We argue that the answer is yes and that such studies can provide or validate linguistic and computational insights.