Discovering Compound and Proper Nouns

Authors:
Grzegorz Protaziuk;Marzena Kryszkiewicz;Henryk Rybinski;Alexandre Delteil
Affiliations:
ICS, Warsaw University of Technology,;ICS, Warsaw University of Technology,;ICS, Warsaw University of Technology,;France Telecome R & D,
Venue:
RSEISP '07 Proceedings of the international conference on Rough Sets and Intelligent Systems Paradigms
Year:
2007

Citing 9
Cited 3

Word association norms, mutual information, and lexicography

Computational Linguistics
Mining Sequential Patterns: Generalizations and Performance Improvements

EDBT '96 Proceedings of the 5th International Conference on Extending Database Technology: Advances in Database Technology
Multiword Expressions: A Pain in the Neck for NLP

CICLing '02 Proceedings of the Third International Conference on Computational Linguistics and Intelligent Text Processing
Discovery of Frequent Word Sequences in Text

Proceedings of the ESF Exploratory Workshop on Pattern Detection and Discovery
Retrieving collocations from text: Xtract

Computational Linguistics - Special issue on using large corpora: I
Retrieving collocations by co-occurrences and word order constraints

ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
Surface grammatical analysis for the extraction of terminological noun phrases

COLING '92 Proceedings of the 14th conference on Computational linguistics - Volume 3
Creating a multilingual collocation dictionary from large text corpora

EACL '03 Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 2
Multiword unit hybrid extraction

MWE '03 Proceedings of the ACL 2003 workshop on Multiword expressions: analysis, acquisition and treatment - Volume 18

Algorithms for the verification of the semantic relation between a compound and a given lexeme

Proceedings of the 12th International Conference on Knowledge Management and Knowledge Technologies
Lexical ontology layer: a bridge between text and concepts

ISMIS'12 Proceedings of the 20th international conference on Foundations of Intelligent Systems
Cross-language patent matching via an international patent classification-based concept bridge

Journal of Information Science

Quantified Score

Hi-index	0.00

Visualization

Abstract

The identification of appropriate text tokens (words or sequences of words representing concepts) is one of the most important tasks of text preprocessing and may have great influence on the final results of text analysis. In our paper, we introduce a new approach to discovering compound nouns, including proper compound nouns. Our approach combines the data mining methods with shallow lexical analysis. We propose a simple pattern language for specifying grammatical patterns to be satisfied by extracted compound nouns. Our method requires annotating the words with part of speech tags, thus to this extent, it is language-dependent. Based on the data mining GSPalgorithm, we propose T-GSPas its modification for extracting frequent text patterns, and in particular, frequent word sequences that satisfy given grammatical rules. The obtained sequences are regarded as candidates for compound nouns. The experiments have proven very high quality of the method.