Monolingual Document Retrieval for European Languages
Information Retrieval
Understanding user goals in web search
Proceedings of the 13th international conference on World Wide Web
Toward better weighting of anchors
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
An analysis of web searching by European AlltheWeb.com users
Information Processing and Management: an International Journal
A study of the dirichlet priors for term frequency normalisation
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
EuroGOV: engineering a multilingual web corpus
CLEF'05 Proceedings of the 6th international conference on Cross-Language Evalution Forum: accessing Multilingual Information Repositories
Terrier information retrieval platform
ECIR'05 Proceedings of the 27th European conference on Advances in Information Retrieval Research
Voting for candidates: adapting data fusion techniques for an expert search task
CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Automatic document prior feature selection for web retrieval
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Voting techniques for expert search
Knowledge and Information Systems
Usefulness of quality click-through data for training
Proceedings of the 2009 workshop on Web Search Click Data
Selective Application of Query-Independent Features in Web Information Retrieval
ECIR '09 Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval
Predicting the Usefulness of Collection Enrichment for Enterprise Search
ICTIR '09 Proceedings of the 2nd International Conference on Theory of Information Retrieval: Advances in Information Retrieval Theory
Multinomial randomness models for retrieval with document fields
ECIR'07 Proceedings of the 29th European conference on IR research
On the usefulness of query features for learning to rank
Proceedings of the 21st ACM international conference on Information and knowledge management
Effective retrieval model for entity with multi-valued attributes: BM25MF and beyond
EKAW'12 Proceedings of the 18th international conference on Knowledge Engineering and Knowledge Management
Efficient and effective retrieval using selective pruning
Proceedings of the sixth ACM international conference on Web search and data mining
Merging words and concepts for medical articles retrieval
Proceedings of the 10th Conference on Open Research Areas in Information Retrieval
Relevance in microblogs: enhancing tweet retrieval using hyperlinked documents
Proceedings of the 10th Conference on Open Research Areas in Information Retrieval
About learning models with multiple query-dependent features
ACM Transactions on Information Systems (TOIS)
Hi-index | 0.00 |
We participated in the WebCLEF 2005 monolingual task. In this task, a search system aims to retrieve relevant documents from a multilingual corpus of Web documents from Web sites of European governments. Both the documents and the queries are written in a wide range of European languages. A challenge in this setting is to detect the language of documents and topics, and to process them appropriately. We develop a language specific technique for applying the correct stemming approach, as well as for removing the correct stopwords from the queries. We represent documents using three fields, namely content, title, and anchor text of incoming hyperlinks. We use a technique called per-field normalisation, which extends the Divergence From Randomness (DFR) framework, to normalise the term frequencies, and to combine them across the three fields. We also employ the length of the URL path of Web documents. The ranking is based on combinations of both the language specific stemming, if applied, and the per-field normalisation. We use our Terrier platform for all our experiments. The overall performance of our techniques is outstanding, achieving the overall top four performing runs, as well as the top performing run without metadata in the monolingual task. The best run only uses per-field normalisation, without applying stemming.