Glean: using syntactic information in document filtering
Information Processing and Management: an International Journal
A critical examination of TDT's cost function
SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
WWW '03 Proceedings of the 12th international conference on World Wide Web
Extracting unstructured data from template generated web documents
CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
Repeatable evaluation of search services in dynamic environments
ACM Transactions on Information Systems (TOIS)
Hi-index | 0.00 |
We describe an evaluation of result set filtering techniques for providing ultra-high precision in the task of presenting related news for general web queries. In this task, the negative user experience generated by retrieving non-relevant documents has a much worse impact than not retrieving relevant ones. We adapt cost-based metrics from the document filtering domain to this result filtering problem in order to explicitly examine the tradeoff between missing relevant documents and retrieving non-relevant ones. A large manual evaluation of three simple threshold filters shows that the basic approach of counting matching title terms outperforms also incorporating selected abstract terms based on part-of-speech or higher-level linguistic structures. Simultaneously, leveraging these cost-based metrics allows us to explicitly determine what other tasks would benefit from these alternative techniques.