Grammatical category disambiguation by statistical optimization
Computational Linguistics
Evaluating text categorization
HLT '91 Proceedings of the workshop on Speech and Natural Language
Some advances in transformation-based part of speech tagging
AAAI '94 Proceedings of the twelfth national conference on Artificial intelligence (vol. 1)
Automatic stochastic tagging of natural language texts
Computational Linguistics
Estimating lexical priors for low-frequency morphologically ambiguous forms
Computational Linguistics
Building a large annotated corpus of English: the penn treebank
Computational Linguistics - Special issue on using large corpora: II
Coping with ambiguity and unknown words through probabilistic models
Computational Linguistics - Special issue on using large corpora: II
A stochastic parts program and noun phrase parser for unrestricted text
ANLC '88 Proceedings of the second conference on Applied natural language processing
A practical part-of-speech tagger
ANLC '92 Proceedings of the third conference on Applied natural language processing
A syntax-based part-of-speech analyser
EACL '95 Proceedings of the seventh conference on European chapter of the Association for Computational Linguistics
Unsupervised learning of word-category guessing rules
ACL '96 Proceedings of the 34th annual meeting on Association for Computational Linguistics
Part-of-speech tagging with neural networks
COLING '94 Proceedings of the 15th conference on Computational linguistics - Volume 1
Learning part-of-speech guessing rules from lexicon: extension to non-concatenative operations
COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 2
Morphological analysis and synthesis by automated discovery and acquisition of linguistic rules
COLING '90 Proceedings of the 13th conference on Computational linguistics - Volume 2
Document centered approach to text normalization
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Learning to lemmatise slovene words
Learning language in logic
Achievements and prospects of learning word morphology with inductive logic programming
Learning language in logic
Periods, capitalized words, etc.
Computational Linguistics
WIA '99 Revised Papers from the 4th International Workshop on Automata Implementation
A Practical Chunker for Unrestricted Text
NLP '00 Proceedings of the Second International Conference on Natural Language Processing
Predicting part-of-speech information about unknown words using statistical methods
ACL '98 Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics - Volume 2
Morphological rule induction for terminology acquistion
COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
A knowledge-free method for capitalized word disambiguation
ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
A second-order Hidden Markov Model for part-of-speech tagging
ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Automatic summarisation of legal documents
ICAIL '03 Proceedings of the 9th international conference on Artificial intelligence and law
A comparison of parsing technologies for the biomedical domain
Natural Language Engineering
Combining distributional and morphological information for part of speech induction
EACL '03 Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 1
XML-based data preparation for robust deep parsing
ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
A low-complexity, broad-coverage probabilistic dependency parser for English
NAACLstudent '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology: Proceedings of the HLT-NAACL 2003 student research workshop - Volume 3
Identification of probable real words: an entropy-based approach
ULA '02 Proceedings of the ACL-02 workshop on Unsupervised lexical acquisition - Volume 9
Modeling english past tense intuitions with minimal generalization
MPL '02 Proceedings of the ACL-02 workshop on Morphological and phonological learning - Volume 6
XML-based NLP tools for analysing and annotating medical language
NLPXML '02 Proceedings of the 2nd workshop on NLP and XML - Volume 17
Named entity recognition with character-level models
CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
Summarising legal texts: sentential tense and argumentative roles
HLT-NAACL-DUC '03 Proceedings of the HLT-NAACL 03 on Text summarization workshop - Volume 5
Domain-specific language models and lexicons for tagging
Journal of Biomedical Informatics
Guessing parts-of-speech of unknown words using global information
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Toward unsupervised whole-corpus tagging
COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Artificial Intelligence in Medicine
Extractive summarisation of legal texts
Artificial Intelligence and Law - AI & law in eGovernment and eDemocracy part I
Applications of corpus-based semantic similarity and word segmentation to database schema matching
The VLDB Journal — The International Journal on Very Large Data Bases
Guessers for Finite-State Transducer Lexicons
CICLing '09 Proceedings of the 10th International Conference on Computational Linguistics and Intelligent Text Processing
CICLing '07 Proceedings of the 8th International Conference on Computational Linguistics and Intelligent Text Processing
Limitations of current grammar induction algorithms
ACL '07 Proceedings of the 45th Annual Meeting of the ACL: Student Research Workshop
Creating a test corpus of clinical notes manually tagged for part-of-speech information
JNLPBA '04 Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and its Applications
UZurich in the BioNLP 2009 shared task
BioNLP '09 Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing: Shared Task
Tagging Portuguese with a Spanish tagger using cognates
CrossLangInduction '06 Proceedings of the International Workshop on Cross-Language Knowledge Induction
Robust ending guessing rules with application to Slavonic languages
ROMAND '04 Proceedings of the 3rd Workshop on RObust Methods in Analysis of Natural Language Data
Discovering lexical information by tagging Arabic newspaper text
Semitic '98 Proceedings of the Workshop on Computational Approaches to Semitic Languages
Dimensionality reduction aids term co-occurrence based multi-document summarization
SumQA '06 Proceedings of the Workshop on Task-Focused Summarization and Question Answering
An analogical learner for morphological analysis
CONLL '05 Proceedings of the Ninth Conference on Computational Natural Language Learning
Morphology induction from term clusters
CONLL '05 Proceedings of the Ninth Conference on Computational Natural Language Learning
Automatically identifying the source words of lexical blends in english
Computational Linguistics
Finding the best picture: cross-media retrieval of content
ECIR'08 Proceedings of the IR research, 30th European conference on Advances in information retrieval
Generating learner-like morphological errors in Russian
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Part-of-speech tagging using parallel weighted finite-state transducers
IceTAL'10 Proceedings of the 7th international conference on Advances in natural language processing
Cross-media entity recognition in nearly parallel visual and textual documents
Large Scale Semantic Access to Content (Text, Image, Video, and Sound)
On Morphological Analysis for Learner Language, Focusing on Russian
Research on Language and Computation
Hi-index | 0.00 |
Words unknown to the lexicon present a substantial problem to NLP modules that rely on morphosyntactic information, such as part-of-speech taggers or syntactic parsers. In this paper we present a technique for fully automatic acquisition of rules that guess possible part-of-speech tags for unknown words using their starting and ending segments. The learning is performed from a general-purpose lexicon and word frequencies collected from a raw corpus. Three complimentary sets of word-guessing rules are statistically induced: prefix morphological rules, suffix morphological rules and ending-guessing rules. Using the proposed technique, unknown-word-guessing rule sets were induced and integrated into a stochastic tagger and a rule-based tagger, which were then applied to texts with unknown words.