Normalization and matching in the DORO system

  • Authors:
  • C. H. A. Koster;C. Derksen;D. Van De Ende;J. Potjer

  • Affiliations:
  • Department of Computer Science, University of Nijmegen, The Netherlands;Department of Computer Science, University of Nijmegen, The Netherlands;Department of Computer Science, University of Nijmegen, The Netherlands;Department of Computer Science, University of Nijmegen, The Netherlands

  • Venue:
  • IRSG'99 Proceedings of the 21st Annual BCS-IRSG conference on Information Retrieval Research
  • Year:
  • 1999

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper is concerned with the use of linguistically motivated phrases as indexing terms in Information Retrieval applications. Apart from the conventional noun phrases, we propose to use verb phrases as index terms for text classification. Techniques for phrase matching through syntactic normalization and semantical matching are described. We discuss the realization of the syntactic normalization of phrases by transduction to frames. Semantical normalization is based on lexico-semantical relations, taking into account certain properties of the classification algorithms used. The ideas described here are being implemented in the Document Routing system DORO, in which statistical learning algorithms are applied to document profiles consisting of phrases. This paper describes the rationale behind work in progress, rather than presenting final results