Conception, evolution, and application of functional programming languages
ACM Computing Surveys (CSUR)
Method for evaluation of stemming algorithms based on error counting
Journal of the American Society for Information Science
Stemming algorithms: a case study for detailed evaluation
Journal of the American Society for Information Science - Special issue: evaluation of information retrieval systems
Corpus-based stemming using cooccurrence of word variants
ACM Transactions on Information Systems (TOIS)
A stemming procedure and stopword list for general French corpora
Journal of the American Society for Information Science
Improving stemming for Arabic information retrieval: light stemming and co-occurrence analysis
SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Learning word normalization using word suffix and context from unlabeled data
ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Character N-Gram Tokenization for European Language Text Retrieval
Information Retrieval
Proceedings of the ninth ACM SIGPLAN international conference on Functional programming
Light stemming approaches for the French, Portuguese, German and Hungarian languages
Proceedings of the 2006 ACM symposium on Applied computing
YASS: Yet another suffix stripper
ACM Transactions on Information Systems (TOIS)
Language morphology offset: Text classification on a Croatian-English parallel corpus
Information Processing and Management: an International Journal
Building the Croatian morphological lexicon
MorphSlav '03 Proceedings of the 2003 EACL Workshop on Morphological Processing of Slavic Languages
EPIA'07 Proceedings of the aritficial intelligence 13th Portuguese conference on Progress in artificial intelligence
Morphological lexicon extraction from raw text data
FinTAL'06 Proceedings of the 5th international conference on Advances in Natural Language Processing
Automatic acquisition of a slovak lexicon from a raw corpus
TSD'05 Proceedings of the 8th international conference on Text, Speech and Dialogue
Textual features for corpus visualization using correspondence analysis
Intelligent Data Analysis
Automatic authorship attribution for texts in croatian language using combinations of features
KES'10 Proceedings of the 14th international conference on Knowledge-based and intelligent information and engineering systems: Part II
An accuracy-enhanced light stemmer for arabic text
ACM Transactions on Speech and Language Processing (TSLP)
Unsupervised topic-oriented keyphrase extraction and its application to Croatian
TSD'11 Proceedings of the 14th international conference on Text, speech and dialogue
Question classification for a Croatian QA system
TSD'11 Proceedings of the 14th international conference on Text, speech and dialogue
Random indexing distributional semantic models for Croatian language
TSD'11 Proceedings of the 14th international conference on Text, speech and dialogue
Expert Systems with Applications: An International Journal
Translation techniques in cross-language information retrieval
ACM Computing Surveys (CSUR)
Hi-index | 0.00 |
Due to natural language morphology, words can take on various morphological forms. Morphological normalisation - often used in information retrieval and text mining systems - conflates morphological variants of a word to a single representative form. In this paper, we describe an approach to lexicon-based inflectional normalisation. This approach is in between stemming and lemmatisation, and is suitable for morphological normalisation of inflectionally complex languages. To eliminate the immense effort required to compile the lexicon by hand, we focus on the problem of acquiring automatically an inflectional morphological lexicon from raw corpora. We propose a convenient and highly expressive morphology representation formalism on which the acquisition procedure is based. Our approach is applied to the morphologically complex Croatian language, but it should be equally applicable to other languages of similar morphological complexity. Experimental results show that our approach can be used to acquire a lexicon whose linguistic quality allows for rather good normalisation performance.