Machine learning for information architecture in a large governmental website

Authors:
Miles Efron;Jonathan Elsas;Gary Marchionini;Junliang Zhang
Affiliations:
University of North Carolina, Chapel Hill, NC;University of North Carolina, Chapel Hill, NC;University of North Carolina, Chapel Hill, NC;University of North Carolina, Chapel Hill, NC
Venue:
Proceedings of the 4th ACM/IEEE-CS joint conference on Digital libraries
Year:
2004

Citing 10
Cited 5

Clustering algorithms

Information retrieval
Dynamic queries for information exploration: an implementation and evaluation

CHI '92 Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
The nature of statistical learning theory

The nature of statistical learning theory
Combining labeled and unlabeled data with co-training

COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
Data clustering: a review

ACM Computing Surveys (CSUR)
Hierarchical classification of Web content

SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Machine Learning

Machine Learning
Modern Information Retrieval

Modern Information Retrieval
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features

ECML '98 Proceedings of the 10th European Conference on Machine Learning
A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning

Evaluation and evolution of a browse and search interface: Relation Browser++

dg.o '05 Proceedings of the 2005 national conference on Digital government research
Categorizing web search results into meaningful and stable categories using fast-feature techniques

Proceedings of the 6th ACM/IEEE-CS joint conference on Digital libraries
Effects of structure and interaction style on distinct search tasks

Proceedings of the 7th ACM/IEEE-CS joint conference on Digital libraries
From Keyword Search to Exploration: Designing Future Search Interfaces for the Web

Foundations and Trends in Web Science
A framework of automatic subject term assignment for text categorization: An indexing conception-based approach

Journal of the American Society for Information Science and Technology

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper describes ongoing research into the application of machine learning techniques for improving access to governmental information in complex digital libraries. Under the auspices of the GovStat Project, our goal is to identify a small number of semantically valid concepts that adequately spans the intellectual domain of a collection. The goal of this discovery is twofold. First we desire a practical aid for information architects. Second, automatically derived document-concept relationships are a necessary precondition for real-world deployment of many dynamic interfaces. The current study compares concept learning strategies based on three document representations: keywords, titles, and full-text. In statistical and user-based studies, human-created keywords provide significant improvements in concept learning over both title-only and full-text representations.