A tutorial on hidden Markov models and selected applications in speech recognition
Readings in speech recognition
Statistical methods for speech recognition
Statistical methods for speech recognition
A language modeling approach to information retrieval
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Information retrieval as statistical translation
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Trawling the Web for emerging cyber-communities
WWW '99 Proceedings of the eighth international conference on World Wide Web
A general language model for information retrieval
Proceedings of the eighth international conference on Information and knowledge management
Effective site finding using link anchor information
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Model-based feedback in the language modeling approach to information retrieval
Proceedings of the tenth international conference on Information and knowledge management
The Importance of Prior Probabilities for Entry Page Search
SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Combining document representations for known-item search
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Language Modeling for Information Retrieval
Language Modeling for Information Retrieval
Parsimonious language models for information retrieval
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Relevance weighting for query independent evidence
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Proceedings of the 14th ACM international conference on Information and knowledge management
Hi-index | 0.00 |
We investigate focused retrieval techniques that deal with the increasing amount of structure on the web. Our approach is to combine multiple representations of web information in a common framework based on statistical language models. In this framework, it will be possible to derive a topical language model of the actual language-use on web pages on a certain topic--such as arts, business, entertainment, education, etc.--using the unigrams and bigrams taken from the plain text of the web pages. Similarly, it will be possible to derive models of the structure of web pages to distinguish between blogs, FAQs, personal web pages, etc. Structural characteristics of a web page include, amongst others, tagname statistics and parent-child tags. We will build a multiple level language model to exploit the information contained in the topical language and structure models. The .GOV2 corpus will be used as a test collection on which queries will be run on different topical categories and on web pages with different structures. We plan to develop so-called parsimonious models to derive a compact representation and to handle dependencies between representations of the data.