Comparing the Effect of Syntactic vs. Statistical Phrase Indexing Strategies for Dutch

Authors:
Wessel Kraaij;Renée Pohlmann
Affiliations:
-;-
Venue:
ECDL '98 Proceedings of the Second European Conference on Research and Advanced Technology for Digital Libraries
Year:
1998

Citing 3
Cited 3

On the application of syntactic methodologies in automatic text analysis

Information Processing and Management: an International Journal - Special issue on natural language processing and information retrieval
Natural language information retrieval

TREC-2 Proceedings of the second conference on Text retrieval conference
Viewing stemming as recall enhancement

SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval

On the Usefulness of Extracting Syntactic Dependencies for Text Indexing

AICS '02 Proceedings of the 13th Irish International Conference on Artificial Intelligence and Cognitive Science
Improving passage retrieval in question answering using NLP

EPIA'05 Proceedings of the 12th Portuguese conference on Progress in Artificial Intelligence
Boosting web retrieval through query operations

ECIR'05 Proceedings of the 27th European conference on Advances in Information Retrieval Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we describe the results of experiments contrasting syntactic phrase indexing with statistical phrase indexing for Dutch texts. Our results showed that we at least need a compound splitting algorithm for good quality retrieval for Dutch texts. If we then add either syntactic or statistical phrases, performance generally improves, but this effect is never statistically significant. If we compare syntactic vs. statistical phrase indexing, syntactic phrases are slightly superior to statistical phrases, particularly at high precision. At higher recall levels syntactic and statistical phrases are equally effective. However, since a compound splitting algorithm requires a dictionary and knowledge about constraints on compound formation, a purely non-linguistic indexing strategy, with or without phrases, does not seem to be very effective for Dutch.