Automatic phrase recognition and extraction from text

Authors:
Fergus Kelledy;Alan F. Smeaton
Affiliations:
School of Computer Applications, Dublin City University, Dublin 9, Ireland;School of Computer Applications, Dublin City University, Dublin 9, Ireland
Venue:
IRSG'97 Proceedings of the 19th Annual BCS-IRSG conference on Information Retrieval Research
Year:
1997

Citing 4
Cited 2

Automatic phrase indexing for document retrieval

SIGIR '87 Proceedings of the 10th annual international ACM SIGIR conference on Research and development in information retrieval
Progress in the application of natural language processing to information retrieval tasks

The Computer Journal - Special issue on information retrieval
Information Retrieval

Information Retrieval
BEN: description of the PLUM system as used for MUC-6

MUC6 '95 Proceedings of the 6th conference on Message understanding

Extraction of complex index terms in non-English IR: A shallow parsing based approach

Information Processing and Management: an International Journal
User-chosen phrases in interactive query formulation for information retrieval

IRSG'98 Proceedings of the 20th Annual BCS-IRSG conference on Information Retrieval Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

One of the problems facing researchers in the field of Information Retrieval (IR) is that the search criteria used during retrieval (the query) contains terms which are very ambiguous and common. By this we mean that terms can have multiple meanings and occur in a large percentage of the documents in a text collection. Many approaches to addressing this problem have been tried with varying degrees of success. One approach to this problem is to attempt to make the vocabulary used by the IR system less ambiguous by using terms which occur only infrequently. In our case this is achieved through an automatic process of phrase recognition and the incorporation of these phrases into the lexicon of the indexing mechanism used. Unlike previous phrase recognition approaches based on NLP, our work requires no linguistic processing of the text in order to extract phrases but is comparable to what is called 'statistical phrases'. In this paper we describe experiments where we evaluate our phrase recognition on the TREC-4 and TREC-5 collections.