SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Term-weighting approaches in automatic text retrieval
Readings in information retrieval
A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Verbs semantics and lexical selection
ACL '94 Proceedings of the 32nd annual meeting on Association for Computational Linguistics
Automatic evaluation of summaries using N-gram co-occurrence statistics
NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Semantic patterns for user-interactive question answering
Concurrency and Computation: Practice & Experience - Second International Conference on Semantics, Knowledge and Grid (SKG2006)
Corpus-based and knowledge-based measures of text semantic similarity
AAAI'06 Proceedings of the 21st national conference on Artificial intelligence - Volume 1
Automatic evaluation of text coherence: models and representations
IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence
Wiki trust metrics based on phrasal analysis
WikiSym '08 Proceedings of the 4th International Symposium on Wikis
Using semantic techniques to access web data
Information Systems
Hi-index | 0.00 |
An automatic method for text categorizing and ranking search engine's results by semantic similarity is proposed in this paper. We first obtain nouns and verbs from snippets obtained from search engine using Name Entity Recognition and part-of speech. A semantic similarity algorithm based on WordNet is proposed to calculate the similarity of each snippet to each of the pre-defined categories. A balanced similarity ranking method combined with Google's rank and timeliness of the pages is proposed to rank these snippets. Preliminary experiments with 500 labeled questions from TREC03 show that 72.7% are correctly categorized.