Using Google latent semantic distance to extract the most relevant information

Authors:
Ping-I Chen;Shi-Jen Lin;Ya-Chi Chu
Affiliations:
Department of Information Management, National Central University, Chung-Li 320, Taiwan, ROC;Department of Information Management, National Central University, Chung-Li 320, Taiwan, ROC;Department of Information Management, National Central University, Chung-Li 320, Taiwan, ROC
Venue:
Expert Systems with Applications: An International Journal
Year:
2011

Citing 18
Cited 2

PAT-tree-based keyword extraction for Chinese information retrieval

Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval
Mining Text Using Keyword Distributions

Journal of Intelligent Information Systems
Probabilistic latent semantic indexing

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Information retrieval and artificial intelligence

Artificial Intelligence - Special issue on applications of artificial intelligence
Improving the effectiveness of information retrieval with local context analysis

ACM Transactions on Information Systems (TOIS)
SearchPad: explicit capture of search context to support Web search

Proceedings of the 9th international World Wide Web conference on Computer networks : the international journal of computer and telecommunications netowrking
Mining web logs for prediction models in WWW caching and prefetching

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Placing search in context: the concept revisited

ACM Transactions on Information Systems (TOIS)
Machine learning in automated text categorization

ACM Computing Surveys (CSUR)
Query Expansion by Mining User Logs

IEEE Transactions on Knowledge and Data Engineering
Personalized Web Search For Improving Retrieval Effectiveness

IEEE Transactions on Knowledge and Data Engineering
Learning to find answers to questions on the Web

ACM Transactions on Internet Technology (TOIT)
Methods for comparing rankings of search engine results

Computer Networks: The International Journal of Computer and Telecommunications Networking - Web dynamics
On the peninsula phenomenon in web graph and its implications on web search

Computer Networks: The International Journal of Computer and Telecommunications Networking
The Google Similarity Distance

IEEE Transactions on Knowledge and Data Engineering
Evaluating Variable-Length Markov Chain Models for Analysis of User Web Navigation Sessions

IEEE Transactions on Knowledge and Data Engineering
Using lexical chains for keyword extraction

Information Processing and Management: an International Journal
Automatic keyword prediction using Google similarity distance

Expert Systems with Applications: An International Journal

Ontology-based semantic similarity: A new feature-based approach

Expert Systems with Applications: An International Journal
A Heuristic Method for Learning Path Sequencing for Intelligent Tutoring System ITS in E-learning

International Journal of Intelligent Information Technologies

Quantified Score

Hi-index	12.05

Visualization

Abstract

There have been many studies about how to help users enter more keywords into a search engine to find the most relevant documents or search results. Methods previously reported in the literature require a database to save the user profile, and construct a well-trained model to provide the potential ''next keyword'' to the user. Because the predictive models are based on the training data, they can only be used in a single knowledge domain. In this paper, we describe a new algorithm called ''Google latent semantic distance'' (GLSD) and use it to extract the most important sequence of keywords to provide the most relevant search results to the user. Our method utilizes on-line, real-time processing and needs no training data. Thus, it can be used in different knowledge domains. Our experiments show that the GLSD can achieve high accuracy, and we can find out the most relevant information in the top search results in most cases. We believe that this new system can increase users' effectiveness in both reading and writing articles.