A new approach for improving field association term dictionary using passage retrieval

  • Authors:
  • Kazuhiro Morita;El-Sayed Atlam;Elmarhomy Ghada;Masao Fuketa;Jun-ichi Aoe

  • Affiliations:
  • Department of Information Science and Intelligent Systems, University of Tokushima, Tokushima, Japan;Department of Information Science and Intelligent Systems, University of Tokushima, Tokushima, Japan;Department of Information Science and Intelligent Systems, University of Tokushima, Tokushima, Japan;Department of Information Science and Intelligent Systems, University of Tokushima, Tokushima, Japan;Department of Information Science and Intelligent Systems, University of Tokushima, Tokushima, Japan

  • Venue:
  • KES'06 Proceedings of the 10th international conference on Knowledge-Based Intelligent Information and Engineering Systems - Volume Part II
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

Large collections of full-text document are now commonly used in automated information retrieval Readers generally identify the subject of a text when they notice specific terms, calledField Association (FA) terms, in that text. Previous researches showed that evidence from passage can improve retrieval results by dividing documents into coherent units with each unit corresponding to a subtopic. Moreover, many current researchers are extracting FA terms candidates from the whole documents to build FA term dictionary automatically. This paper proposes a method for automatically building new FA term dictionary from documents after using passage retrieval. A WWW search engine is used to extract FA terms candidates from passage document corpora. Then, new FA terms candidates in each field are automatically compared with previously determined FA terms dictionary. Finally, new FA terms from extracted term candidates are appended automatically to the existence FA terms dictionary. From experimental results the new technique using passage documents can automatically append about 15% of FA terms from terms candidates to the existence FA term dictionary over the old method. Moreover, Recall and Precision significantly improved by 20% and 32% over the traditional method. The proposed methods are applied to 38,372 articles from the large tagged corpus.