From Plain Character Strings to Meaningful Words: Producing Better Full Text Databases for Inflectional and Compounding Languages with Morphological Analysis Software

Authors:
Riitta Alkula
Affiliations:
Tieto Enator Corporation, Finland
Venue:
Information Retrieval
Year:
2001

Citing 5
Cited 10

Full text databases

Full text databases
Viewing morphology as an inference process

SIGIR '93 Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval
Stemming algorithms: a case study for detailed evaluation

Journal of the American Society for Information Science - Special issue: evaluation of information retrieval systems
Stemming methodologies over individual query words for an Arabic information retrieval system

Journal of the American Society for Information Science
A stemming procedure and stopword list for general French corpora

Journal of the American Society for Information Science

User-oriented evaluation methods for information retrieval: a case study based on conceptual models for query expansion

Exploring artificial intelligence in the new millennium
Stemming and lemmatization in the clustering of finnish text documents

Proceedings of the thirteenth ACM international conference on Information and knowledge management
How do search engines respond to some non-English queries?

Journal of Information Science
Hierarchical clustering of a Finnish newspaper article collection with graded relevance assessments

Information Retrieval
Developing an automatic linguistic truncation operator for best-match retrieval of Finnish in inflected word form text database indexes

Journal of Information Science
Indexing strategies for Swedish full text retrieval under different user scenarios

Information Processing and Management: an International Journal
A flexible framework to experiment with ontology learning techniques

Knowledge-Based Systems
Question answering system for incomplete and noisy data: methods and measures for its evaluation

ECIR'03 Proceedings of the 25th European conference on IR research
Comparative Study of Indexing and Search Strategies for the Hindi, Marathi, and Bengali Languages

ACM Transactions on Asian Language Information Processing (TALIP)
Speech retrieval from unsegmented finnish audio using statistical morpheme-like units for segmentation, recognition, and retrieval

ACM Transactions on Speech and Language Processing (TSLP)

Quantified Score

Hi-index	0.00

Visualization

Abstract

The paper deals with linguistic processing and retrieval techniques in fulltext databases. Special attention is focused on the characteristics of highly inflectional languages, and how morphological structure of a language should be taken into account, when designing and developing information retrieval systems. Finnish is used as an example of a language, which has a more complicated inflectional structure than the English language. In the FULLTEXT project, natural language analysis modules for Finnish were incorporated into the commercial BASIS information retrieval system, which is based on inverted files and Boolean searching. Several test databases were produced, each using one or two Finnish morphological analysis programs.