Resources for Turkish morphological processing

Authors:
Haşim Sak;Tunga Güngör;Murat Saraçlar
Affiliations:
Department of Computer Engineering, Boğaziçi University, Istanbul, Turkey 34342;Department of Computer Engineering, Boğaziçi University, Istanbul, Turkey 34342;Department of Electrical & Electronic Engineering, Boğğaziçi University, Istanbul, Turkey 34342
Venue:
Language Resources and Evaluation
Year:
2011

Citing 30
Cited 2

Regular models of phonological rule systems

Computational Linguistics - Special issue on computational phonology
Large Margin Classification Using the Perceptron Algorithm

Machine Learning - The Eleventh Annual Conference on computational Learning Theory
The combinatory morphemic lexicon

Computational Linguistics
Mining the Web for Synonyms: PMI-IR versus LSA on TOEFL

EMCL '01 Proceedings of the 12th European Conference on Machine Learning
A large-scale study of the evolution of web pages

WWW '03 Proceedings of the 12th international conference on World Wide Web
Introduction to the special issue on the web as corpus

Computational Linguistics - Special issue on web as corpus
The Web as a parallel corpus

Computational Linguistics - Special issue on web as corpus
Finite-state transducers in language and speech processing

Computational Linguistics
Morphological disambiguation by voting constraints

ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
A general computational model for word-form recognition and production

ACL '84 Proceedings of the 10th International Conference on Computational Linguistics and 22nd annual meeting on Association for Computational Linguistics
Combining stochastic and rule-based methods for disambiguation in agglutinative languages

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Tagging inflective languages: prediction of morphological categories for a rich, structured tagset

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Two-level morphology with composition

COLING '92 Proceedings of the 14th conference on Computational linguistics - Volume 1
New ranking algorithms for parsing and tagging: kernels over discrete structures, and the voted perceptron

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Web-based models for natural language processing

ACM Transactions on Speech and Language Processing (TSLP)
Discriminative training methods for hidden Markov models: theory and experiments with perceptron algorithms

EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
Using the web to overcome data sparseness

EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
Web crawling ethics revisited: Cost, privacy, and denial of service

Journal of the American Society for Information Science and Technology
Randomized algorithms and NLP: using locality sensitive hash function for high speed noun clustering

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Learning morphological disambiguation rules for Turkish

HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
Googleology is Bad Science

Computational Linguistics
The Google Similarity Distance

IEEE Transactions on Knowledge and Data Engineering
Domain classification of technical terms using the Web

Systems and Computers in Japan
Tree-Traversing Ant Algorithm for term clustering based on featureless similarities

Data Mining and Knowledge Discovery
Constructing Web Corpora through Topical Web Partitioning for Term Recognition

AI '08 Proceedings of the 21st Australasian Joint Conference on Artificial Intelligence: Advances in Artificial Intelligence
Morphological Disambiguation of Turkish Text with Perceptron Algorithm

CICLing '07 Proceedings of the 8th International Conference on Computational Linguistics and Intelligent Text Processing
A probabilistic framework for automatic term recognition

Intelligent Data Analysis
The architecture and the implementation of a finite state pronunciation lexicon for Turkish

Computer Speech and Language
A stochastic finite-state morphological parser for Turkish

ACLShort '09 Proceedings of the ACL-IJCNLP 2009 Conference Short Papers
OpenFst: a general and efficient weighted finite-state transducer library

CIAA'07 Proceedings of the 12th international conference on Implementation and application of automata

Ontology learning from text: A look back and into the future

ACM Computing Surveys (CSUR)
Morpheme segmentation in the METU-Sabancı Turkish treebank

LAW VI '12 Proceedings of the Sixth Linguistic Annotation Workshop

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present a set of language resources and tools--a morphological parser, a morphological disambiguator, and a text corpus--for exploiting Turkish morphology in natural language processing applications. The morphological parser is a state-of-the-art finite-state transducer-based implementation of Turkish morphology. The disambiguator is based on the averaged perceptron algorithm and has the best accuracy reported for Turkish in the literature. The text corpus has been compiled from the web and contains about 500 million tokens. This is the largest Turkish web corpus published.