Natural language information retrieval: TIPSTER-2 final report

  • Authors:
  • Tomek Strzalkowski

  • Affiliations:
  • GE Corporate Research & Development, Schenectady, NY

  • Venue:
  • TIPSTER '96 Proceedings of a workshop on held at Vienna, Virginia: May 6-8, 1996
  • Year:
  • 1996

Quantified Score

Hi-index 0.00

Visualization

Abstract

We report on the joint GE/NYU natural language information retrieval project as related to the Tipster Phase 2 research conducted initially at NYU and subsequently at GE R&D Center and NYU. The evaluation results discussed here were obtained in connection with the 3rd and 4th Text Retrieval Conferences (TREC-3 and TREC-4). The main thrust of this project is to use natural language processing techniques to enhance the effectiveness of full-text document retrieval. During the course of the four TREC conferences, we have built a prototype IR system designed around a statistical full-text indexing and search backbone provided by the NIST's Prise engine. The original Prise has been modified to allow handling of multi-word phrases, differential term weighting schemes, automatic query expansion, index partitioning and rank merging, as well as dealing with complex documents. Natural language processing is used to preprocess the documents in order to extract content-carrying terms, discover inter-term dependencies and build a conceptual hierarchy specific to the database domain, and process user's natural language requests into effective search queries.The overall architecture of the system is essentially the same for both years, as our efforts were directed at optimizing the performance of all components. A notable exception is the new massive query expansion module used in routing experiments, which replaces a prototype extension used in the TREC-3 system. On the other hand, it has to be noted that the character and the level of difficulty of TREC queries has changed quite significantly since the last year evaluation. TREC-4 new ad-hoc queries are far shorter, less focused, and they have a flavor of information requests ("What is the prognosis of ...") rather than search directives typical for earlier TRECs ("The relevant document will contain ..."). This makes building of good search queries a more sensitive task than before. We thus decided to introduce only minimum number of changes to our indexing and search processes, and even roll back some of the TREC-3 extensions which dealt with longer and somewhat redundant queries.Overall, our system performed quite well as our position with respect to the best systems improved steadily since the beginning of TREC. We participated in both main evaluation categories: category A ad-hoc and routing, working with approx. 3.3 GBytes of text. We submitted 4 official runs in automatic adhoc, manual ad-hoc, and automatic routing (2), and were ranked 6 or 7 in each category (out of 38 participating teams). It should be noted that the most significant gain in performance seems to have occurred in precision near the top of the ranking, at 5, 10, 15 and 20 documents. Indeed, our unofficial manual runs performed after TREC-4 conference show superior results in these categories, topping by a large margin the best manual scores by any system in the official evaluation.In general, we can note substantial improvement in performance when phrasal terms are used, especially in ad-hoc runs. Looking back at TREC-2 and TREC-3 one may observe that these improvements appear to be tied to the length and specificity of the query: the longer the query, the more improvement from linguistic processes. This can be seen comparing the improvement over baseline for automatic adhoc runs (very short queries), for manual runs (longer queries), and for semi-interactive runs (yet longer queries). In addition, our TREC-3 results (with long and detailed queries) showed 20-25% improvement in precision attributed to NLP, as compared to 10-16% in TREC-4.