A language modeling approach to information retrieval
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Record-boundary discovery in Web documents
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Information retrieval as statistical translation
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Authoritative sources in a hyperlinked environment
Journal of the ACM (JACM)
Two approaches to bringing Internet services to WAP devices
Proceedings of the 9th international World Wide Web conference on Computer networks : the international journal of computer and telecommunications netowrking
Function-based object model towards website adaptation
Proceedings of the 10th international conference on World Wide Web
A study of smoothing methods for language models applied to Ad Hoc information retrieval
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Discovering informative content blocks from Web documents
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Improving pseudo-relevance feedback in web information retrieval using web page segmentation
WWW '03 Proceedings of the 12th international conference on World Wide Web
HTML Page Analysis Based on Visual Cues
ICDAR '01 Proceedings of the Sixth International Conference on Document Analysis and Recognition
Eliminating noisy information in Web pages for data mining
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Learning block importance models for web pages
Proceedings of the 13th international conference on World Wide Web
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Extracting content structure for web pages based on visual representation
APWeb'03 Proceedings of the 5th Asia-Pacific web conference on Web technologies and applications
A broadcast model for web image annotation
PCM'06 Proceedings of the 7th Pacific Rim conference on Advances in Multimedia Information Processing
Hi-index | 0.00 |
Using probabilistic Language Modeling approach in Information Retrieval, model for each document is estimated individually. However, with Web pages becoming more complex, each of them may contain some blocks discussing different topics. Consequently, the performance of statistic model for web document tends to be degraded by the mixture of topics. In this paper, we argue that segmenting Web page into several relatively independent blocks will assist the language modeling and a Block-based Language Modeling (BLM) approach is proposed. Different with normal method, BLM refines the modeling process into two parts: the probability of a query occurring in a block, and the probability of a block occurring in a Web page. Then given a query, those pages with more relevant blocks tend to be retrieved. Experimental results show that when unigram model is used, our approach outperforms original language modeling for web search in most cases.