Approaches to passage retrieval in full text information systems
SIGIR '93 Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval
Relevance score normalization for metasearch
Proceedings of the tenth international conference on Information and knowledge management
ICDAR '99 Proceedings of the Fifth International Conference on Document Analysis and Recognition
Multi-paragraph segmentation of expository text
ACL '94 Proceedings of the 32nd annual meeting on Association for Computational Linguistics
Extracting Relevant Snippets fromWeb Documents through Language Model based Text Segmentation
WI '07 Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence
Introduction to Information Retrieval
Introduction to Information Retrieval
Positional language models for information retrieval
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Lexical cohesion based topic modeling for summarization
CICLing'08 Proceedings of the 9th international conference on Computational linguistics and intelligent text processing
Hi-index | 0.00 |
In recent years, users use result snippets of a web search engine to grasp the content of web pages, when users search for useful information on the internet. However, they are sometimes unable to notice the content of web pages by reading the result snippets because these snippets are so short that they cannot determine whether the content of each web page is relevant. To address this problem, we propose a method for grasping the content of each web page and extracting a part of the web page concerned to query keywords. This method is more effective than conventional methods based on snippets, because we regard the content as a set of words in the text of a web page, and we generate the content-density distribution by using both the position and the influence of the word. In the result of our experiments, we found that our method is useful for gasping the influence of extracted web text.