Semantically driven snippet selection for supporting focused web searches

Authors:
Iraklis Varlamis;Sofia Stamou
Affiliations:
Harokopio University of Athens, Department of Informatics and Telematics, 89, Harokopou Street, 17671 Athens, Greece;Patras University, Computer Engineering and Informatics Department, 26500 Patras, Greece
Venue:
Data & Knowledge Engineering
Year:
2009

Citing 26
Cited 5

Term-weighting approaches in automatic text retrieval

Information Processing and Management: an International Journal
Lexical ambiguity and information retrieval

ACM Transactions on Information Systems (TOIS)
Centering: a framework for modeling the local coherence of discourse

Computational Linguistics
Web document clustering: a feasibility demonstration

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Real life, real users, and real needs: a study and analysis of user queries on the web

Information Processing and Management: an International Journal
Scaling question answering to the Web

Proceedings of the 10th international conference on World Wide Web
Quantitative evaluation of passage retrieval algorithms for question answering

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Passage retrieval vs. document retrieval for factoid question answering

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
THESUS: Organizing Web document collections based on link semantics

The VLDB Journal — The International Journal on Very Large Data Bases
Analyses for elucidating current question answering technology

Natural Language Engineering
Discovery of inference rules for question-answering

Natural Language Engineering
Homonymy and polysemy in information retrieval

ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
Verbs semantics and lexical selection

ACL '94 Proceedings of the 32nd annual meeting on Association for Computational Linguistics
A personalized search engine based on web-snippet hierarchical clustering

WWW '05 Special interest tracks and posters of the 14th international conference on World Wide Web
Answering what-is questions by Virtual Annotation

HLT '01 Proceedings of the first international conference on Human language technology research
Evaluating answers to definition questions

NAACL-Short '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology: companion volume of the Proceedings of HLT-NAACL 2003--short papers - Volume 2
A web-based kernel function for measuring the similarity of short text snippets

Proceedings of the 15th international conference on World Wide Web
Evaluating WordNet-based Measures of Lexical Semantic Relatedness

Computational Linguistics
Novel association measures using web search with double checking

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Using question series to evaluate question answering system effectiveness

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
A practically unsupervised learning method to identify single-snippet answers to definition questions on the web

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Measuring semantic similarity between words using web search engines

Proceedings of the 16th international conference on World Wide Web
Measures of semantic similarity and relatedness in the biomedical domain

Journal of Biomedical Informatics
Automatic Extraction of Useful Facet Hierarchies from Text Databases

ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Word sense disambiguation with spreading activation networks generated from thesauri

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Automatic evaluation of text coherence: models and representations

IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence

Including summaries in system evaluation

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
A methodology to learn ontological attributes from the Web

Data & Knowledge Engineering
Recommendation-based editor for business process modeling

Data & Knowledge Engineering
Editorial: Narrative-based taxonomy distillation for effective indexing of text collections

Data & Knowledge Engineering
Web query disambiguation using PageRank

Journal of the American Society for Information Science and Technology

Quantified Score

Hi-index	0.00

Visualization

Abstract

Millions of people access the plentiful web content to locate information that is of interest to them. Searching is the primary web access method for many users. During search, the users visit a web search engine and use an interface to specify a query (typically comprising a few keywords) that best describes their information need. Upon query issuing, the engine's retrieval modules identify a set of potentially relevant pages in the engine's index, and return them to the users, ordered in a way that reflects the pages' relevance to the query keywords. Currently, all major search engines display search results as a ranked list of URLs (pointing to the relevant pages' physical location on the web) accompanied by the returned pages' titles and small text fragments that summarize the context of search keywords. Such text fragments are widely known as snippets and they serve towards offering a glimpse to the returned pages' contents. In general, text snippets, extracted from the retrieved pages, are an indicator of the pages' usefulness to the query intention and they help the users browse search results and decide on the pages to visit. Thus far, the extraction of text snippets from the returned pages' contents relies on statistical methods in order to determine which text fragments contain most of the query keywords. Typically, the first two text nuggets in the page's contents that contain the query keywords are merged together to produce the final snippet that accompanies the page's title and URL in the search results. Unfortunately, statistically generated snippets are not always representative of the pages' contents and they are not always closely related to the query intention. Such text snippets might mislead web users in visiting pages of little interest or usefulness to them. In this article, we propose a snippet selection technique, which identifies within the contents of the query-relevant pages those text fragments that are both highly relevant to the query intention and expressive of the pages' entire contents. The motive for our work is to assist web users make informed decisions before clicking on a page in the list of search results. Towards this goal, we firstly show how to analyze search results in order to decipher the query intention. Then, we process the content of the query matching pages in order to identify text fragments that highly correlate to the query semantics. Finally, we evaluate the query-related text fragments in terms of coherence and expressiveness and pick from every retrieved page the text nugget that highly correlates to the query intention and is also very representative of the page's content. A thorough evaluation over a large number of web pages and queries suggests that the proposed snippet selection technique extracts good quality text snippets with high precision and recall that are superior to existing snippet selection methods. Our study also reveals that the snippets delivered by our method can help web users decide on which results to click. Overall, our study suggests that semantically driven snippet selection can be used to augment traditional snippet extraction approaches that are mainly dependent upon the statistical properties of words within a text.