Automatic document metadata extraction using support vector machines
Proceedings of the 3rd ACM/IEEE-CS joint conference on Digital libraries
Tree Structures for Mining Association Rules
Data Mining and Knowledge Discovery
ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Grid-based digital libraries: cheshire3 and distributed retrieval
Proceedings of the 5th ACM/IEEE-CS joint conference on Digital libraries
Indexing and searching tera-scale Grid-Based Digital Libraries
InfoScale '06 Proceedings of the 1st international conference on Scalable information systems
Cheshire3: retrieving from tera-scale grid-based digital libraries
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Bidirectional inference with the easiest-first strategy for tagging sequence data
HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
LIBSVM: A library for support vector machines
ACM Transactions on Intelligent Systems and Technology (TIST)
Content integration in digital libraries
AMC '09 Proceedings of the 2009 workshop on Ambient media computing
Digital Preservation in Grids and Clouds: A Middleware Approach
Journal of Grid Computing
Accelerating text mining workloads in a MapReduce-based distributed GPU environment
Journal of Parallel and Distributed Computing
Hi-index | 0.00 |
This paper explores the integration of text mining and data mining techniques, digital library systems, and computational and data grid technologies with the objective of developing an online classification service exemplar. We discuss the current research issues relating to the use of data mining algorithms and toolkits for textual data; the necessary changes within the Cheshire3 Information Framework to accommodate analysis workflows; the outcomes of a demonstrator based on the National Library of Medicine's Medline dataset; and the provision of comparable metrics for evaluation purposes. The prototype has resulted in extremely accurate online classification services and offers a novel method of supporting text mining and data mining within a highly scaled computational environment, integrated seamlessly into the digital library architecture.