Algorithms for the verification of the semantic relation between a compound and a given lexeme

Authors:
Gudrun Kellner;Johannes Grünauer
Affiliations:
Vienna University of Technology, Austria;Vienna University of Technology, Austria
Venue:
Proceedings of the 12th International Conference on Knowledge Management and Knowledge Technologies
Year:
2012

Citing 9
Cited 0

The combinatory morphemic lexicon

Computational Linguistics
How Effective is Stemming and Decompounding for German Text Retrieval?

Information Retrieval
Discovering Compound and Proper Nouns

RSEISP '07 Proceedings of the international conference on Rough Sets and Intelligent Systems Paradigms
A Mixed Method Lemmatization Algorithm Using a Hierarchy of Linguistic Identities (HOLI)

GoTAL '08 Proceedings of the 6th international conference on Advances in Natural Language Processing
German Compounds in Factored Statistical Machine Translation

GoTAL '08 Proceedings of the 6th international conference on Advances in Natural Language Processing
Decompounding query keywords from compounding languages

HLT-Short '08 Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Short Papers
Unsupervised and knowledge-free learning of compound splits and periphrases

CICLing'08 Proceedings of the 9th international conference on Computational linguistics and intelligent text processing
German decompounding in a difficult corpus

CICLing'08 Proceedings of the 9th international conference on Computational linguistics and intelligent text processing
Language-independent compound splitting with morphological operations

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1

Quantified Score

Hi-index	0.00

Visualization

Abstract

Text mining on a lexical basis is quite well developed for the English language. In compounding languages, however, lexicalized words are often a combination of two or more semantic units. New words can be built easily by concatenating existing ones, without putting any white spaces in between. That poses a problem to existing search algorithms: Such compounds could be of high interest for a search request, but how can be examined whether a compound comprises a given lexeme? A string match can be considered as an indication, but does not prove semantic relation. The same problem is faced when using lexicon based approaches where signal words are defined as lexemes only and need to be identified in all forms of appearance, and hence also as component of a compound. This paper explores the characteristics of compounds and their constituent elements for German, and compares seven algorithms with regard to runtime and error rates. The results of this study are relevant to query analysis and term weighting approaches in information retrieval system design.