HunPos: an open source trigram tagger

Authors:
Péter Halácsy;András Kornai;Csaba Oravecz
Affiliations:
Budapest U. of Technology, Budapest, Stoczek;MetaCarta Inc., Cambridge, MA;Institute of Linguistics, Budapest, Benczur
Venue:
ACL '07 Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions
Year:
2007

Citing 8
Cited 13

Grammatical category disambiguation by statistical optimization

Computational Linguistics
TnT: a statistical part-of-speech tagger

ANLC '00 Proceedings of the sixth conference on Applied natural language processing
A stochastic parts program and noun phrase parser for unrestricted text

ANLC '88 Proceedings of the second conference on Applied natural language processing
Statistical morphological disambiguation for agglutinative languages

COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
Serial combination of rules and statistics: a case study in Czech tagging

ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
Feature-rich part-of-speech tagging with a cyclic dependency network

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Part of speech tagging in context

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Context-based morphological disambiguation with random fields

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing

Refining the most frequent sense baseline

DEW '09 Proceedings of the Workshop on Semantic Evaluations: Recent Achievements and Future Directions
MorphoLogic's submission for the WMT 2009 shared task

StatMT '09 Proceedings of the Fourth Workshop on Statistical Machine Translation
TANL-1: Coreference resolution by parse analysis and similarity clustering

SemEval '10 Proceedings of the 5th International Workshop on Semantic Evaluation
Improving hierarchical document signature performance by classifier combination

ICONIP'10 Proceedings of the 17th international conference on Neural information processing: theory and algorithms - Volume Part I
Improving arabic part-of-speech tagging through morphological analysis

ACIIDS'11 Proceedings of the Third international conference on Intelligent information and database systems - Volume Part I
Automatic verb extraction from historical Swedish texts

LaTeCH '11 Proceedings of the 5th ACL-HLT Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities
The Uppsala-FBK systems at WMT 2011

WMT '11 Proceedings of the Sixth Workshop on Statistical Machine Translation
Bootstrapped named entity recognition for product attribute extraction

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Parsing the past: identification of verb constructions in historical text

LaTeCH '12 Proceedings of the 6th Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities
On generating coherent multilingual descriptions of museum objects from semantic web ontologies

INLG '12 Proceedings of the Seventh International Natural Language Generation Conference
Automatically generated NE tagged corpora for English and Hungarian

NEWS '12 Proceedings of the 4th Named Entity Workshop
Tree kernels for machine translation quality estimation

WMT '12 Proceedings of the Seventh Workshop on Statistical Machine Translation
Rule-Based morphological tagger for an inflectional language

COST'11 Proceedings of the 2011 international conference on Cognitive Behavioural Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

In the world of non-proprietary NLP software the standard, and perhaps the best, HMM-based POS tagger is TnT (Brants, 2000). We argue here that some of the criticism aimed at HMM performance on languages with rich morphology should more properly be directed at TnT's peculiar license, free but not open source, since it is those details of the implementation which are hidden from the user that hold the key for improved POS tagging across a wider variety of languages. We present HunPos, a free and open source (LGPL-licensed) alternative, which can be tuned by the user to fully utilize the potential of HMM architectures, offering performance comparable to more complex models, but preserving the ease and speed of the training and tagging process.