Categorizing and ranking search engine's results by semantic similarity

Authors:
Tianyong Hao;Zhi Lu;Shitong Wang;Tiansong Zou;Shenhua GU;Liu Wenyin
Affiliations:
City University of Hong Kong, Hong Kong, China;City University of Hong Kong, Hong Kong, China;City University of Hong Kong, Hong Kong, China;City University of Hong Kong, Hong Kong, China;City University of Hong Kong, Hong Kong, China;City University of Hong Kong, Hong Kong, China
Venue:
Proceedings of the 2nd international conference on Ubiquitous information management and communication
Year:
2008

Citing 8
Cited 2

Expert network: effective and efficient learning from human decisions in text categorization and retrieval

SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Term-weighting approaches in automatic text retrieval

Readings in information retrieval
A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Verbs semantics and lexical selection

ACL '94 Proceedings of the 32nd annual meeting on Association for Computational Linguistics
Automatic evaluation of summaries using N-gram co-occurrence statistics

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Semantic patterns for user-interactive question answering

Concurrency and Computation: Practice & Experience - Second International Conference on Semantics, Knowledge and Grid (SKG2006)
Corpus-based and knowledge-based measures of text semantic similarity

AAAI'06 Proceedings of the 21st national conference on Artificial intelligence - Volume 1
Automatic evaluation of text coherence: models and representations

IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence

Wiki trust metrics based on phrasal analysis

WikiSym '08 Proceedings of the 4th International Symposium on Wikis
Using semantic techniques to access web data

Information Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

An automatic method for text categorizing and ranking search engine's results by semantic similarity is proposed in this paper. We first obtain nouns and verbs from snippets obtained from search engine using Name Entity Recognition and part-of speech. A semantic similarity algorithm based on WordNet is proposed to calculate the similarity of each snippet to each of the pre-defined categories. A balanced similarity ranking method combined with Google's rank and timeliness of the pages is proposed to rank these snippets. Preliminary experiments with 500 labeled questions from TREC03 show that 72.7% are correctly categorized.