A discriminative alignment model for abbreviation recognition

Authors:
Naoaki Okazaki;Sophia Ananiadou;Jun'ichi Tsujii
Affiliations:
University of Tokyo, Bunkyo-ku, Tokyo, Japan;University of Manchester, Manchester, UK;University of Tokyo, Bunkyo-ku, Tokyo, Japan and University of Manchester, Manchester, UK
Venue:
COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
Year:
2008

Citing 14
Cited 7

A maximum entropy approach to natural language processing

Computational Linguistics
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Semi-supervised Maximum Entropy based approach to acronym and abbreviation normalization in medical texts

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Discriminative training and maximum entropy models for statistical machine translation

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
SaRAD: a Simple and Robust Abbreviation Dictionary

Bioinformatics
Resolving abbreviations to their senses in Medline

Bioinformatics
Medstract: creating large-scale information servers for biomedical libraries

BioMed '02 Proceedings of the ACL-02 workshop on Natural language processing in the biomedical domain - Volume 3
A large scale, corpus-based approach for automatically disambiguating biomedical abbreviations

ACM Transactions on Information Systems (TOIS)
ADAM: another database of abbreviations in MEDLINE

Bioinformatics
Building an abbreviation dictionary using a term recognition approach

Bioinformatics
Log-linear models for word alignment

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Discriminative word alignment with conditional random fields

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Scalable training of L1-regularized log-linear models

Proceedings of the 24th international conference on Machine learning
A supervised learning approach to acronym identification

AI'05 Proceedings of the 18th Canadian Society conference on Advances in Artificial Intelligence

Disambiguation of biomedical abbreviations

BioNLP '09 Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing
Robust approach to abbreviating terms: a discriminative latent variable model with global information

ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2 - Volume 2
Abbreviation generation for Japanese multi-word expressions

MWE '09 Proceedings of the Workshop on Multiword Expressions: Identification, Interpretation, Disambiguation and Applications
Data-driven computational linguistics at FaMAF-UNC, Argentina

YIWCALA '10 Proceedings of the NAACL HLT 2010 Young Investigators Workshop on Computational Approaches to Languages of the Americas
Splitting noun compounds via monolingual and bilingual paraphrasing: a study on Japanese katakana words

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Learning Abbreviations from Chinese and English Terms by Modeling Non-Local Information

ACM Transactions on Asian Language Information Processing (TALIP)
An algorithm for local geoparsing of microtext

Geoinformatica

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents a discriminative alignment model for extracting abbreviations and their full forms appearing in actual text. The task of abbreviation recognition is formalized as a sequential alignment problem, which finds the optimal alignment (origins of abbreviation letters) between two strings (abbreviation and full form). We design a large amount of finegrained features that directly express the events where letters produce or do not produce abbreviations. We obtain the optimal combination of features on an aligned abbreviation corpus by using the maximum entropy framework. The experimental results show the usefulness of the alignment model and corpus for improving abbreviation recognition.