Integrating data and text mining processes for digital library applications

Authors:
Robert Sanderson;Paul Watry
Affiliations:
University of Liverpool, Liverpool, United Kingdom;University of Liverpool, Liverpool, United Kingdom
Venue:
Proceedings of the 7th ACM/IEEE-CS joint conference on Digital libraries
Year:
2007

Citing 8
Cited 3

Automatic document metadata extraction using support vector machines

Proceedings of the 3rd ACM/IEEE-CS joint conference on Digital libraries
Tree Structures for Mining Association Rules

Data Mining and Knowledge Discovery
Untangling text data mining

ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Grid-based digital libraries: cheshire3 and distributed retrieval

Proceedings of the 5th ACM/IEEE-CS joint conference on Digital libraries
Indexing and searching tera-scale Grid-Based Digital Libraries

InfoScale '06 Proceedings of the 1st international conference on Scalable information systems
Cheshire3: retrieving from tera-scale grid-based digital libraries

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Bidirectional inference with the easiest-first strategy for tagging sequence data

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
LIBSVM: A library for support vector machines

ACM Transactions on Intelligent Systems and Technology (TIST)

Content integration in digital libraries

AMC '09 Proceedings of the 2009 workshop on Ambient media computing
Digital Preservation in Grids and Clouds: A Middleware Approach

Journal of Grid Computing
Accelerating text mining workloads in a MapReduce-based distributed GPU environment

Journal of Parallel and Distributed Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper explores the integration of text mining and data mining techniques, digital library systems, and computational and data grid technologies with the objective of developing an online classification service exemplar. We discuss the current research issues relating to the use of data mining algorithms and toolkits for textual data; the necessary changes within the Cheshire3 Information Framework to accommodate analysis workflows; the outcomes of a demonstrator based on the National Library of Medicine's Medline dataset; and the provision of comparable metrics for evaluation purposes. The prototype has resulted in extremely accurate online classification services and offers a novel method of supporting text mining and data mining within a highly scaled computational environment, integrated seamlessly into the digital library architecture.