Inference networks for document retrieval
Inference networks for document retrieval
Numerical recipes in C (2nd ed.): the art of scientific computing
Numerical recipes in C (2nd ed.): the art of scientific computing
Word sense disambiguation and information retrieval
SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Natural language vs. Boolean query evaluation: a comparison of retrieval performance
SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
The effectiveness of GIOSS for the text database discovery problem
SIGMOD '94 Proceedings of the 1994 ACM SIGMOD international conference on Management of data
NetSerf: using semantic knowledge to find Internet information archives
SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Searching distributed collections with inference networks
SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Learning collection fusion strategies
SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Effective retrieval with distributed collections
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Evaluating database selection techniques: a testbed and experiment
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
A language modeling approach to information retrieval
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Variations in relevance judgments and the measurement of retrieval effectiveness
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Methods for information server selection
ACM Transactions on Information Systems (TOIS)
Comparing the performance of database selection algorithms
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Cluster-based language models for distributed retrieval
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
A decision-theoretic approach to database selection in networked IR
ACM Transactions on Information Systems (TOIS)
A general language model for information retrieval
Proceedings of the eighth international conference on Information and knowledge management
GlOSS: text-source discovery over the Internet
ACM Transactions on Database Systems (TODS)
Overview of the sixth text REtrieval conference (TREC-6)
Information Processing and Management: an International Journal - The sixth text REtrieval conference (TREC-6)
Evaluating evaluation measure stability
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
The impact of database selection on distributed searching
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Collection selection and results merging with topically organized U.S. patents and TREC data
Proceedings of the ninth international conference on Information and knowledge management
Query-based sampling of text databases
ACM Transactions on Information Systems (TOIS)
Exploiting a controlled vocabulary to improve collection selection and retrieval effectiveness
Proceedings of the tenth international conference on Information and knowledge management
Mercator: A scalable, extensible Web crawler
World Wide Web
Determining Text Databases to Search in the Internet
VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Generalizing GlOSS to Vector-Space Databases and Broker Hierarchies
VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
Server Ranking for Distributed Text Retrieval Systems on the Internet
Proceedings of the Fifth International Conference on Database Systems for Advanced Applications (DASFAA)
Pharos: A Scalable Distributed Architecture for Locating Heterogeneous Information Sources
Pharos: A Scalable Distributed Architecture for Locating Heterogeneous Information Sources
An empirical study of smoothing techniques for language modeling
ACL '96 Proceedings of the 34th annual meeting on Association for Computational Linguistics
TIPSTER '93 Proceedings of a workshop on held at Fredericksburg, Virginia: September 19-23, 1993
ACM Transactions on Information Systems (TOIS)
SETS: search enhanced by topic segmentation
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Unified utility maximization framework for resource selection
Proceedings of the thirteenth ACM international conference on Information and knowledge management
Client-system collaboration for legal corpus selection in an online production environment
ICAIL '03 Proceedings of the 9th international conference on Artificial intelligence and law
Foundations and Trends in Information Retrieval
Hi-index | 0.00 |
The continued growth of very large data environments such as Westlaw, Dialog, and the World Wide Web, increases the importance of effective and efficient database selection and searching. Recent research has focused on autonomous and automatic collection selection, searching, and results merging in distributed environments. These studies often rely on TREC data and queries for experimentation. We have extended this work to West's on-line production environment where thousands of legal, financial and news databases are accessed by up to a quarter-million professional users each day. Using the WIN natural language search engine, a cousin to UMass's INQUERY, along with a collection retrieval inference network (CORI) to provide database scoring, we examine the effect that a set of optimized parameters has on database selection performance. We also compare current language modeling techniques to this approach. Traditionally, West's information has been structured over 15,000 online databases, representing roughly 6 terabytes of textual data. Given the expense of running global searches in this environment, it is usually not practical to perform full document retrieval over the entire collection. It is therefore necessary to create a new infrastructure to support automatic database selection in the service of broader searching. In this research, we represent our operational environment in two distinct ways. First, we characterize the underlying physical databases that serve as a foundation for the entire Westlaw search system. Second, we create a rearchitected set of logical document collections that corresponds to classes of high level organizational concepts such as jurisdiction, practice area, and document-type. Keeping the end-user in mind, we focus on performance issues relating to optimal database selection, where domain experts have provided complete pre-hoc relevance judgments for collections characterized under each of our physical and logical database models.