Scaled log likelihood ratios for the detection of abbreviations in text corpora

Authors:
Tibor Kiss;Jan Strunk
Affiliations:
Ruhr-Universität Bochum, Bochum;Ruhr-Universität Bochum, Bochum
Venue:
COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 2
Year:
2002

Citing 5
Cited 2

Foundations of statistical natural language processing

Foundations of statistical natural language processing
Information Retrieval

Information Retrieval
Methoden zum qualitativen Vergleich von Signifikanzmaßen zur Kollokationsidentifikation

KONVENS 2000 / Sprachkommunikation, Vorträge der gemeinsamen Veranstaltung 5. Konferenz zur Verarbeitung natürlicher Sprache (KONVENS), 6. ITG-Fachtagung "Sprachkommunikation"
Accurate methods for the statistics of surprise and coincidence

Computational Linguistics - Special issue on using large corpora: I
Adaptive multilingual sentence boundary disambiguation

Computational Linguistics

Unsupervised Multilingual Sentence Boundary Detection

Computational Linguistics
A comparative evaluation of a new unsupervised sentence boundary detection approach on documents in english and portuguese

CICLing'06 Proceedings of the 7th international conference on Computational Linguistics and Intelligent Text Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

We describe a language-independent, flexible, and accurate method for the detection of abbreviations in text corpora. It is based on the idea that an abbreviation can be viewed as a collocation, and can be identified by using methods for collocation detection such as the log likelihood ratio. Although the log likelihood ratio is known to show a good recall, its precision is poor. We employ scaling factors which lead to a strong improvement of precision. Experiments with English and German corpora show that abbreviations can be detected with high accuracy.