Mining and classification of neologisms in Persian blogs

  • Authors:
  • Karine Megerdoomian;Ali Hadjarian

  • Affiliations:
  • The MITRE Corporation, McLean, VA;The MITRE Corporation, McLean, VA

  • Venue:
  • CALC '10 Proceedings of the NAACL HLT 2010 Second Workshop on Computational Approaches to Linguistic Creativity
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

The exponential growth of the Persian blogosphere and the increased number of neologisms create a major challenge in NLP applications of Persian blogs. This paper describes a method for extracting and classifying newly constructed words and borrowings from Persian blog posts. The analysis of the occurrence of neologisms across five distinct topic categories points to a correspondence between the topic domain and the type of neologism that is most commonly encountered. The results suggest that different approaches should be implemented for the automatic detection and processing of neologisms depending on the domain of application.