The automatic identification of lexical variation between language varieties

  • Authors:
  • Yves Peirsman;Dirk Geeraerts;Dirk Speelman

  • Affiliations:
  • Research foundation – flanders (fwo), egmontstraat 5, 1000 brussels, belgium email: yves.peirsman@arts.kuleuven.be and quantitative lexicology and variational linguistics (qlvl), university ...;Quantitative lexicology and variational linguistics (qlvl), university of leuven, blijde-inkomststraat 21 p.o. box 3308, 3000 leuven, belgium email: dirk.geeraerts@arts.kuleuven.be, dirk.speelman@ ...;Quantitative lexicology and variational linguistics (qlvl), university of leuven, blijde-inkomststraat 21 p.o. box 3308, 3000 leuven, belgium email: dirk.geeraerts@arts.kuleuven.be, dirk.speelman@ ...

  • Venue:
  • Natural Language Engineering
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Languages are not uniform. Speakers of different language varieties use certain words differently – more or less frequently, or with different meanings. We argue that distributional semantics is the ideal framework for the investigation of such lexical variation. We address two research questions and present our analysis of the lexical variation between Belgian Dutch and Netherlandic Dutch. The first question involves a classic application of distributional models: the automatic retrieval of synonyms. We use corpora of two different language varieties to identify the Netherlandic Dutch synonyms for a set of typically Belgian words. Second, we address the problem of automatically identifying words that are typical of a given lect, either because of their high frequency or because of their divergent meaning. Overall, we show that distributional models are able to identify more lectal markers than traditional keyword methods. Distributional models also have a bias towards a different type of variation. In summary, our results demonstrate how distributional semantics can help research in variational linguistics, with possible future applications in lexicography or terminology extraction.