Collaborative research - digital government: a language modeling approach to metadata for cross-database linkage and search

Authors:
W. Bruce Croft;Jamie Callan
Affiliations:
University of Massachusetts Amherst;Carnegie Mellon University
Venue:
dg.o '04 Proceedings of the 2004 annual national conference on Digital government research
Year:
2004

Citing 8
Cited 0

Query-based sampling of text databases

ACM Transactions on Information Systems (TOIS)
QuASM: a system for question answering using semi-structured data

Proceedings of the 2nd ACM/IEEE-CS joint conference on Digital libraries
Using sampled data and regression to merge search engine results

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Pruning long documents for distributed information retrieval

Proceedings of the eleventh international conference on Information and knowledge management
Table extraction using conditional random fields

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Relevant document distribution estimation method for resource selection

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
A semisupervised learning method to merge search engine results

ACM Transactions on Information Systems (TOIS)
Reducing storage costs for federated search of text databases

dg.o '03 Proceedings of the 2003 annual national conference on Digital government research

Quantified Score

Hi-index	0.00

Visualization

Abstract

This research demonstrates that language models are a sound and effective foundation on which to build large-scale, distributed information systems for government applications. It contributes to providing an alternative to human-generated metadata for locating information resources. Manual indexing is expensive, and studies show that people are inconsistent and inaccurate when doing indexing, which leads to poor retrieval effectiveness. Generating content descriptions automatically from the markup and structure of documents is less expensive and, when coupled with good search techniques, can be used to locate relevant information more consistently. The evaluation testbeds for our research have been government databases such as those found in FedStats and GPO.