Probabilistic topic decomposition of an eighteenth-century American newspaper

Authors:
David J. Newman;Sharon Block
Affiliations:
Department of Computer Science, University of California, Irvine, CA 92697-3100;Department of History, University of California, Irvine, CA 92697-3275
Venue:
Journal of the American Society for Information Science and Technology
Year:
2006

Citing 9
Cited 12

Probabilistic latent semantic indexing

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Matrices, Vector Spaces, and Information Retrieval

SIAM Review
Concept decompositions for large sparse text data using clustering

Machine Learning
Unsupervised learning by probabilistic latent semantic analysis

Machine Learning
Introduction to Modern Information Retrieval

Introduction to Modern Information Retrieval
Mining the Web: Discovering Knowledge from HyperText Data

Mining the Web: Discovering Knowledge from HyperText Data
Latent dirichlet allocation

The Journal of Machine Learning Research
Probabilistic author-topic models for information discovery

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
The author-topic model for authors and documents

UAI '04 Proceedings of the 20th conference on Uncertainty in artificial intelligence

Subject metadata enrichment using statistical topic models

Proceedings of the 7th ACM/IEEE-CS joint conference on Digital libraries
Mining concepts from code with probabilistic topic models

Proceedings of the twenty-second IEEE/ACM international conference on Automated software engineering
Topic Extraction with AGAPE

ADMA '07 Proceedings of the 3rd international conference on Advanced Data Mining and Applications
Sourcerer: mining and searching internet-scale software repositories

Data Mining and Knowledge Discovery
Using backward elimination with a new model order reduction algorithm to select best double mixture model for document clustering

Expert Systems with Applications: An International Journal
Evaluating models of latent document semantics in the presence of OCR errors

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
P2LSA and P2LSA+: two paralleled probabilistic latent semantic analysis algorithms based on the mapreduce model

IDEAL'11 Proceedings of the 12th international conference on Intelligent data engineering and automated learning
Analyzing entities and topics in news articles using statistical topic models

ISI'06 Proceedings of the 4th IEEE international conference on Intelligence and Security Informatics
Topic modeling on historical newspapers

LaTeCH '11 Proceedings of the 5th ACL-HLT Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities
Computational historiography: Data mining in a century of classics journals

Journal on Computing and Cultural Heritage (JOCCH)
Large-scale clustering and complete facet and tag calculation

ECDL'07 Proceedings of the 11th European conference on Research and Advanced Technology for Digital Libraries
A study on document retrieval system based on visualization to manage OCR documents

HCI'13 Proceedings of the 15th international conference on Human-Computer Interaction: interaction modalities and techniques - Volume Part IV

Quantified Score

Hi-index	0.00

Visualization

Abstract

We use a probabilistic mixture decomposition method to determine topics in the Pennsylvania Gazette, a major colonial U.S. newspaper from 1728–1800. We assess the value of several topic decomposition techniques for historical research and compare the accuracy and efficacy of various methods. After determining the topics covered by the 80,000 articles and advertisements in the entire 18th century run of the Gazette, we calculate how the prevalence of those topics changed over time, and give historically relevant examples of our findings. This approach reveals important information about the content of this colonial newspaper, and suggests the value of such approaches to a more complete understanding of early American print culture and society. © 2006 Wiley Periodicals, Inc.