A similarity-based method for retrieving documents from the SCI/SSCI database

  • Authors:
  • Yen-Liang Chen;Jhong-Jhih Wei;Shin-Yi Wu;Ya-Han Hu

  • Affiliations:
  • -;-;-;Department of Information Management, National Central University, Taiwan, R.O.C.

  • Venue:
  • Journal of Information Science
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

As more and more documents become electronically available, finding documents in large databases that fit users' needs is becoming increasingly important. In the past, the document search problem was dealt with using the database query approach or the text-based search approach. In this paper, we investigate this problem, focusing on the SCI/SSCI databases from ISI. Specifically, we design our search methodology based on the four fields commonly seen in a scientific research document: abstract, title, keywords, and reference list. Of these four, only the abstract field can be viewed as a normal text, while the other three have their own characteristics to differentiate them from texts. Therefore, we first develop a method to compute the similarity value for each field. Our next problem is combining the four similarity values into a final value. One approach is to assign weights to each and compute the weighted sum. We have not adopted this simple weighting method, however, because it is difficult to determine appropriate weights. Instead, we use the back propagation neural network to combine them. Finally, extensive experiments have been carried out using real documents drawn from TKDE journal, and the results indicate that in all situations our method has a much higher accuracy than the traditional text-based search approach.