Foundations of statistical natural language processing
Foundations of statistical natural language processing
Concept decompositions for large sparse text data using clustering
Machine Learning
Unsupervised learning by probabilistic latent semantic analysis
Machine Learning
The Journal of Machine Learning Research
A Scalable Topic-Based Open Source Search Engine
WI '04 Proceedings of the 2004 IEEE/WIC/ACM International Conference on Web Intelligence
An initial evaluation of automated organization for digital library browsing
Proceedings of the 5th ACM/IEEE-CS joint conference on Digital libraries
Probabilistic topic decomposition of an eighteenth-century American newspaper
Journal of the American Society for Information Science and Technology
Bibliometric impact measures leveraging topic analysis
Proceedings of the 6th ACM/IEEE-CS joint conference on Digital libraries
Pachinko allocation: DAG-structured mixture models of topic correlations
ICML '06 Proceedings of the 23rd international conference on Machine learning
Analyzing entities and topics in news articles using statistical topic models
ISI'06 Proceedings of the 4th IEEE international conference on Intelligence and Security Informatics
Fast collapsed gibbs sampling for latent dirichlet allocation
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Latent Style Model: Discovering writing styles for calligraphy works
Journal of Visual Communication and Image Representation
Topic model methods for automatically identifying out-of-scope resources
Proceedings of the 9th ACM/IEEE-CS joint conference on Digital libraries
Latent Dirichlet Allocation with topic-in-set knowledge
SemiSupLearn '09 Proceedings of the NAACL HLT 2009 Workshop on Semi-Supervised Learning for Natural Language Processing
Leveraging personal metadata for Desktop search: The Beagle++ system
Web Semantics: Science, Services and Agents on the World Wide Web
Evaluating topic models for digital libraries
Proceedings of the 10th annual joint conference on Digital libraries
Are learned topics more useful than subject headings
Proceedings of the 11th annual international ACM/IEEE joint conference on Digital libraries
Using statistical topic models to organize and visualize large-scale architectural image databases
ACM SIGGRAPH 2011 Posters
Metadata enrichment via topic models for author name disambiguation
NLP4DL'09/AT4DL'09 Proceedings of the 2009 international conference on Advanced language technologies for digital libraries
Automatic tag recommendation for metadata annotation using probabilistic topic modeling
Proceedings of the 13th ACM/IEEE-CS joint conference on Digital libraries
Hi-index | 0.00 |
Creating a collection of metadata records from disparate and diverse sources often results in uneven, unreliable and variable quality subject metadata. Having uniform, consistent and enriched subject metadata allows users to more easily discover material, browse the collection, and limit keyword search results by subject. We demonstrate how statistical topic models are useful for subject metadata enrichment. We describe some of the challenges of metadata enrichment on a huge scale (10 million metadata records from 700 repositories in the OAIster Digital Library) when the metadata is highly heterogeneous (metadata about images and text, and both cultural heritage material and scientific literature). We show how to improve the quality of the enriched metadata, using both manual and statistical modeling techniques. Finally, we discuss some of the challenges of the production environment, and demonstrate the value of the enriched metadata in a prototype portal.