Experiments in Automatic Phrase Indexing For Document Retrieval: A Comparison of Syntactic and Non-Syntactic Methods

Authors:
Joel L Fagan
Affiliations:
-
Venue:
Experiments in Automatic Phrase Indexing For Document Retrieval: A Comparison of Syntactic and Non-Syntactic Methods
Year:
1987

Citing 0
Cited 19

Experiments on incorporating syntactic processing of user queries into a document retrieval strategy

SIGIR '88 Proceedings of the 11th annual international ACM SIGIR conference on Research and development in information retrieval
Natural language techniques for intelligent information retrieval

SIGIR '88 Proceedings of the 11th annual international ACM SIGIR conference on Research and development in information retrieval
Automatic text indexing using complex identifiers

DOCPROCS '88 Proceedings of the ACM conference on Document processing systems
On the application of syntactic methodologies in automatic text analysis

SIGIR '89 Proceedings of the 12th annual international ACM SIGIR conference on Research and development in information retrieval
Using syntactic analysis in a document retrieval system that uses signature files

SIGIR '90 Proceedings of the 13th annual international ACM SIGIR conference on Research and development in information retrieval
The use of phrases and structured queries in information retrieval

SIGIR '91 Proceedings of the 14th annual international ACM SIGIR conference on Research and development in information retrieval
Retrieval activities in a database consisting of heterogeneous collections of structured text

SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
FLEXICON: an evaluation of a statistical ranking model adapted to intelligent legal text management

ICAIL '93 Proceedings of the 4th international conference on Artificial intelligence and law
Improving English and Chinese Ad-Hoc Retrieval: A Tipster Text Phase 3 Project Report

Information Retrieval
Syntactic approaches to automatic book indexing

ACL '88 Proceedings of the 26th annual meeting on Association for Computational Linguistics
The importance of proper weighting methods

HLT '93 Proceedings of the workshop on Human Language Technology
Structured queries in XML retrieval

Proceedings of the 14th ACM international conference on Information and knowledge management
Articulating information needs in XML query languages

ACM Transactions on Information Systems (TOIS)
Statistical query translation models for cross-language information retrieval

ACM Transactions on Asian Language Information Processing (TALIP)
Extraction of complex index terms in non-English IR: A shallow parsing based approach

Information Processing and Management: an International Journal
Improving automated requirements trace retrieval: a study of term-based enhancement methods

Empirical Software Engineering
When close enough is good enough: approximate positional indexes for efficient ranked retrieval

Proceedings of the 20th ACM international conference on Information and knowledge management
Boosting web retrieval through query operations

ECIR'05 Proceedings of the 27th European conference on Advances in Information Retrieval Research
Document vector representations for feature extraction in multi-stage document ranking

Information Retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

In order for an automatic information retrieval system to effectively retrieve documents related to a given subject area, the content of each document in the system''s database must be represented accurately. This study examines the hypothesis that better representations of document content can be constructed if the content analysis method takes into consideration the syntactic structure of document and query texts. Two methods of automatically generating phrases for use as content indicators have been implemented and tested experimentally. The non-syntactic (or statistical) method is based on simple text characteristics such as word frequency and the proximity of words in text. The syntactic method uses augmented phrase structure rules (production rules) to selectively extract phrases from parse trees generated by an automatic syntactic analyzer. Experimental results show that the effect of non-syntactic phrase indexing is inconsistent. For the five collections tested, increases in average precision ranged from 22.7% to 2.2% over simple, single term indexing. The syntactic phrase indexing method was tested on two collections. Precision figures averaged over all test queries indicate that non-syntactic phrase indexing performs significantly better than syntactic phrase indexing for one collection, but that the difference is insignificant for the other collection. More detailed analysis of individual queries, however, indicates that the performance of both methods is highly variable, and that there is evidence that syntax-based indexing has certain benefits not available with the non-syntactic approach. Possible improvements of both methods of phrase indexing are considered. It is concluded that the prospects for improving the syntax-based approach to document indexing are better than for the non-syntactic approach. The PLNLP system was used for syntactic analysis of document and query texts, and for implementing the syntax-based phrase construction rules. The SMART information retrieval system was used for retrieval experimentation.