Modeling Documents by Combining Semantic Concepts with Unsupervised Statistical Learning

Authors:
Chaitanya Chemudugunta;America Holloway;Padhraic Smyth;Mark Steyvers
Affiliations:
Department of Computer Science, University of California,Irvine, Irvine,;Department of Computer Science, University of California,Irvine, Irvine,;Department of Computer Science, University of California,Irvine, Irvine,;Department of Cognitive Science, University of California, Irvine, Irvine,
Venue:
ISWC '08 Proceedings of the 7th International Conference on The Semantic Web
Year:
2008

Citing 7
Cited 11

Class-based n-gram models of natural language

Computational Linguistics
S-CREAM - Semi-automatic CREAtion of Metadata

EKAW '02 Proceedings of the 13th International Conference on Knowledge Engineering and Knowledge Management. Ontologies and the Semantic Web
SemTag and seeker: bootstrapping the semantic web via automated semantic annotation

WWW '03 Proceedings of the 12th international conference on World Wide Web
Latent dirichlet allocation

The Journal of Machine Learning Research
Harnessing the Expertise of 70,000 Human Editors: Knowledge-Based Feature Generation for Text Categorization

The Journal of Machine Learning Research
Combining concept hierarchies and statistical topic models

Proceedings of the 17th ACM conference on Information and knowledge management
Tree-structured conditional random fields for semantic annotation

ISWC'06 Proceedings of the 5th international conference on The Semantic Web

Combining concept hierarchies and statistical topic models

Proceedings of the 17th ACM conference on Information and knowledge management
Learning Semantic Query Suggestions

ISWC '09 Proceedings of the 8th International Semantic Web Conference
Latent topic feedback for information retrieval

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Getting the meaning right: a complementary distributional layer for the web semantics

ISWC'11 Proceedings of the 10th international conference on The semantic web - Volume Part I
Mapping queries to the Linking Open Data cloud: A case study using DBpedia

Web Semantics: Science, Services and Agents on the World Wide Web
A framework for incorporating general domain knowledge into latent Dirichlet allocation using first-order logic

IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Two
SSHLDA: a semi-supervised hierarchical topic model

EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Hierarchical topic integration through semi-supervised hierarchical topic modeling

Proceedings of the 21st ACM international conference on Information and knowledge management
Collaborative filtering by analyzing dynamic user interests modeled by taxonomy

ISWC'12 Proceedings of the 11th international conference on The Semantic Web - Volume Part I
DASISH: An Initiative for a European Data Humanities Infrastructure

Proceedings of International Conference on Information Integration and Web-based Applications & Services
Context-dependent conceptualization

IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

Human-defined concepts are fundamental building-blocks in constructing knowledge bases such as ontologies. Statistical learning techniques provide an alternative automated approach to concept definition, driven by data rather than prior knowledge. In this paper we propose a probabilistic modeling framework that combines both human-defined concepts and data-driven topics in a principled manner. The methodology we propose is based on applications of statistical topic models (also known as latent Dirichlet allocation models). We demonstrate the utility of this general framework in two ways. We first illustrate how the methodology can be used to automatically tag Web pages with concepts from a known set of concepts without any need for labeled documents. We then perform a series of experiments that quantify how combining human-defined semantic knowledge with data-driven techniques leads to better language models than can be obtained with either alone.