Early user---system interaction for database selection in massive domain-specific online environments

Authors:
Jack G. Conrad;Joanne R. S. Claussen
Affiliations:
Thomson Legal & Regulatory, St. Paul, Minnesota;West Group
Venue:
ACM Transactions on Information Systems (TOIS)
Year:
2003

Citing 30
Cited 3

Inference networks for document retrieval

Inference networks for document retrieval
Word sense disambiguation and information retrieval

SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Natural language vs. Boolean query evaluation: a comparison of retrieval performance

SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
A system for discovering relationships by feature extraction from text databases

SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
The effectiveness of GIOSS for the text database discovery problem

SIGMOD '94 Proceedings of the 1994 ACM SIGMOD international conference on Management of data
NetSerf: using semantic knowledge to find Internet information archives

SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Searching distributed collections with inference networks

SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Assessing agreement on classification tasks: the kappa statistic

Computational Linguistics
Users lost (summary): reflections on the past, future, and limits of information science

Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval
Effective retrieval with distributed collections

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Variations in relevance judgments and the measurement of retrieval effectiveness

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Methods for information server selection

ACM Transactions on Information Systems (TOIS)
Comparing the performance of database selection algorithms

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Cluster-based language models for distributed retrieval

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Usability, user preferences, effectiveness, and user behaviors when searching individual and integrated full-text databases: implications for digital libraries

Journal of the American Society for Information Science
Overview of the sixth text REtrieval conference (TREC-6)

Information Processing and Management: an International Journal - The sixth text REtrieval conference (TREC-6)
A user-centered design approach to personalization

Communications of the ACM
Helping people find what they don't know

Communications of the ACM
Evaluating evaluation measure stability

SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
The impact of database selection on distributed searching

SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Collection selection and results merging with topically organized U.S. patents and TREC data

Proceedings of the ninth international conference on Information and knowledge management
Towards a highly-scalable and effective metasearch engine

Proceedings of the 10th international conference on World Wide Web
Query-based sampling of text databases

ACM Transactions on Information Systems (TOIS)
A cognitive approach to judicial opinion structure: applying domain expertise to component analysis

Proceedings of the 8th international conference on Artificial intelligence and law
Mercator: A scalable, extensible Web crawler

World Wide Web
The Philosophy of Information Retrieval Evaluation

CLEF '01 Revised Papers from the Second Workshop of the Cross-Language Evaluation Forum on Evaluation of Cross-Language Information Retrieval Systems
Mapping Entry Vocabulary to Unfamiliar Metadata Vocabularies

Mapping Entry Vocabulary to Unfamiliar Metadata Vocabularies
Concept Hierarchy Based Text Database Categorization in a Metasearch Engine Environment

WISE '00 Proceedings of the First International Conference on Web Information Systems Engineering (WISE'00)-Volume 1 - Volume 1
Pharos: A Scalable Distributed Architecture for Locating Heterogeneous Information Sources

Pharos: A Scalable Distributed Architecture for Locating Heterogeneous Information Sources
Database selection using actual physical and acquired logical collection resources in a massive domain-specific operational environment

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases

Textual analysis of stock market prediction using breaking financial news: The AZFin text system

ACM Transactions on Information Systems (TOIS)
Federated Search

Foundations and Trends in Information Retrieval
Querying e-catalogs using content summaries

ODBASE'06/OTM'06 Proceedings of the 2006 Confederated international conference on On the Move to Meaningful Internet Systems: CoopIS, DOA, GADA, and ODBASE - Volume Part I

Quantified Score

Hi-index	0.00

Visualization

Abstract

The continued growth of very large data environments such as Westlaw and Dialog, in addition to the World Wide Web, increases the importance of effective and efficient database selection and searching. Current research focuses largely on completely autonomous and automatic selection, searching, and results merging in distributed environments. This fully automatic approach has significant deficiencies, including reliance upon thresholds below which databases with relevant documents are not searched (compromised recall). It also merges documents, often from disparate data sources that users may have discarded before their source selection task proceeded (diluted precision). We examine the impact that early user interaction can have on the process of database selection. After analyzing thousands of real user queries, we show that precision can be significantly increased when queries are categorized by the users themselves, then handled effectively by the system. Such query categorization strategies may eliminate limitations of fully automated query processing approaches. Our system harnesses the WIN search engine, a sibling to INQUERY, run against one or more authority sources when search is required. We compare our approach to one that does not recognize or utilize distinct features associated with user queries. We show that by avoiding a one-size-fits-all approach that restricts the role users can play in information discovery, database selection effectiveness can be appreciably improved.