Term-weighting approaches in automatic text retrieval
Information Processing and Management: an International Journal
Lexical ambiguity and information retrieval
ACM Transactions on Information Systems (TOIS)
Centering: a framework for modeling the local coherence of discourse
Computational Linguistics
Web document clustering: a feasibility demonstration
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Real life, real users, and real needs: a study and analysis of user queries on the web
Information Processing and Management: an International Journal
Scaling question answering to the Web
Proceedings of the 10th international conference on World Wide Web
Quantitative evaluation of passage retrieval algorithms for question answering
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Passage retrieval vs. document retrieval for factoid question answering
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
THESUS: Organizing Web document collections based on link semantics
The VLDB Journal — The International Journal on Very Large Data Bases
Analyses for elucidating current question answering technology
Natural Language Engineering
Discovery of inference rules for question-answering
Natural Language Engineering
Homonymy and polysemy in information retrieval
ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
Verbs semantics and lexical selection
ACL '94 Proceedings of the 32nd annual meeting on Association for Computational Linguistics
A personalized search engine based on web-snippet hierarchical clustering
WWW '05 Special interest tracks and posters of the 14th international conference on World Wide Web
Answering what-is questions by Virtual Annotation
HLT '01 Proceedings of the first international conference on Human language technology research
Evaluating answers to definition questions
NAACL-Short '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology: companion volume of the Proceedings of HLT-NAACL 2003--short papers - Volume 2
A web-based kernel function for measuring the similarity of short text snippets
Proceedings of the 15th international conference on World Wide Web
Evaluating WordNet-based Measures of Lexical Semantic Relatedness
Computational Linguistics
Novel association measures using web search with double checking
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Using question series to evaluate question answering system effectiveness
HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Measuring semantic similarity between words using web search engines
Proceedings of the 16th international conference on World Wide Web
Measures of semantic similarity and relatedness in the biomedical domain
Journal of Biomedical Informatics
Automatic Extraction of Useful Facet Hierarchies from Text Databases
ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Word sense disambiguation with spreading activation networks generated from thesauri
IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Automatic evaluation of text coherence: models and representations
IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence
Including summaries in system evaluation
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
A methodology to learn ontological attributes from the Web
Data & Knowledge Engineering
Recommendation-based editor for business process modeling
Data & Knowledge Engineering
Editorial: Narrative-based taxonomy distillation for effective indexing of text collections
Data & Knowledge Engineering
Web query disambiguation using PageRank
Journal of the American Society for Information Science and Technology
Hi-index | 0.00 |
Millions of people access the plentiful web content to locate information that is of interest to them. Searching is the primary web access method for many users. During search, the users visit a web search engine and use an interface to specify a query (typically comprising a few keywords) that best describes their information need. Upon query issuing, the engine's retrieval modules identify a set of potentially relevant pages in the engine's index, and return them to the users, ordered in a way that reflects the pages' relevance to the query keywords. Currently, all major search engines display search results as a ranked list of URLs (pointing to the relevant pages' physical location on the web) accompanied by the returned pages' titles and small text fragments that summarize the context of search keywords. Such text fragments are widely known as snippets and they serve towards offering a glimpse to the returned pages' contents. In general, text snippets, extracted from the retrieved pages, are an indicator of the pages' usefulness to the query intention and they help the users browse search results and decide on the pages to visit. Thus far, the extraction of text snippets from the returned pages' contents relies on statistical methods in order to determine which text fragments contain most of the query keywords. Typically, the first two text nuggets in the page's contents that contain the query keywords are merged together to produce the final snippet that accompanies the page's title and URL in the search results. Unfortunately, statistically generated snippets are not always representative of the pages' contents and they are not always closely related to the query intention. Such text snippets might mislead web users in visiting pages of little interest or usefulness to them. In this article, we propose a snippet selection technique, which identifies within the contents of the query-relevant pages those text fragments that are both highly relevant to the query intention and expressive of the pages' entire contents. The motive for our work is to assist web users make informed decisions before clicking on a page in the list of search results. Towards this goal, we firstly show how to analyze search results in order to decipher the query intention. Then, we process the content of the query matching pages in order to identify text fragments that highly correlate to the query semantics. Finally, we evaluate the query-related text fragments in terms of coherence and expressiveness and pick from every retrieved page the text nugget that highly correlates to the query intention and is also very representative of the page's content. A thorough evaluation over a large number of web pages and queries suggests that the proposed snippet selection technique extracts good quality text snippets with high precision and recall that are superior to existing snippet selection methods. Our study also reveals that the snippets delivered by our method can help web users decide on which results to click. Overall, our study suggests that semantically driven snippet selection can be used to augment traditional snippet extraction approaches that are mainly dependent upon the statistical properties of words within a text.