Using Google latent semantic distance to extract the most relevant information

  • Authors:
  • Ping-I Chen;Shi-Jen Lin;Ya-Chi Chu

  • Affiliations:
  • Department of Information Management, National Central University, Chung-Li 320, Taiwan, ROC;Department of Information Management, National Central University, Chung-Li 320, Taiwan, ROC;Department of Information Management, National Central University, Chung-Li 320, Taiwan, ROC

  • Venue:
  • Expert Systems with Applications: An International Journal
  • Year:
  • 2011

Quantified Score

Hi-index 12.05

Visualization

Abstract

There have been many studies about how to help users enter more keywords into a search engine to find the most relevant documents or search results. Methods previously reported in the literature require a database to save the user profile, and construct a well-trained model to provide the potential ''next keyword'' to the user. Because the predictive models are based on the training data, they can only be used in a single knowledge domain. In this paper, we describe a new algorithm called ''Google latent semantic distance'' (GLSD) and use it to extract the most important sequence of keywords to provide the most relevant search results to the user. Our method utilizes on-line, real-time processing and needs no training data. Thus, it can be used in different knowledge domains. Our experiments show that the GLSD can achieve high accuracy, and we can find out the most relevant information in the top search results in most cases. We believe that this new system can increase users' effectiveness in both reading and writing articles.