Historical analysis of legal opinions with a sparse mixed-effects latent variable model

Authors:
William Yang Wang;Elijah Mayfield;Suresh Naidu;Jeremiah Dittmar
Affiliations:
Carnegie Mellon University;Carnegie Mellon University;Columbia University;American University
Venue:
ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
Year:
2012

Citing 15
Cited 2

Latent dirichlet allocation

The Journal of Machine Learning Research
Discriminative training methods for hidden Markov models: theory and experiments with perceptron algorithms

EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
Enabling information retrieval on historical document collections: the role of matching procedures and special lexica

Proceedings of The Third Workshop on Analytics for Noisy Unstructured Text Data
A latent variable model for geographic lexical variation

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
"Got you!": automatic vandalism detection in Wikipedia with web-based shallow syntactic-semantic modeling

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
LIBSVM: A library for support vector machines

ACM Transactions on Intelligent Systems and Technology (TIST)
An analysis of perspectives in interactive settings

Proceedings of the First Workshop on Social Media Analytics
Discovering sociolinguistic associations with structured sparsity

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Language use as a reflection of socialization in online communities

LSM '11 Proceedings of the Workshop on Languages in Social Media
An efficient algorithm for topic ranking and modeling topic evolution

DEXA'11 Proceedings of the 22nd international conference on Database and expert systems applications - Volume Part I
Probabilistic latent semantic analysis

UAI'99 Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence
Historical event extraction from text

LaTeCH '11 Proceedings of the 5th ACL-HLT Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities
Automatic verb extraction from historical Swedish texts

LaTeCH '11 Proceedings of the 5th ACL-HLT Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities
Topic modeling on historical newspapers

LaTeCH '11 Proceedings of the 5th ACL-HLT Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities
Semantic topic models: combining word distributional statistics and dictionary definitions

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing

"Love ya, jerkface": using sparse log-linear models to build positive (and impolite) relationships with teens

SIGDIAL '12 Proceedings of the 13th Annual Meeting of the Special Interest Group on Discourse and Dialogue
Extraction of topic evolutions from references in scientific articles and its GPU acceleration

Proceedings of the 21st ACM international conference on Information and knowledge management

Quantified Score

Hi-index	0.00

Visualization

Abstract

We propose a latent variable model to enhance historical analysis of large corpora. This work extends prior work in topic modelling by incorporating metadata, and the interactions between the components in metadata, in a general way. To test this, we collect a corpus of slavery-related United States property law judgements sampled from the years 1730 to 1866. We study the language use in these legal cases, with a special focus on shifts in opinions on controversial topics across different regions. Because this is a longitudinal data set, we are also interested in understanding how these opinions change over the course of decades. We show that the joint learning scheme of our sparse mixed-effects model improves on other state-of-the-art generative and discriminative models on the region and time period identification tasks. Experiments show that our sparse mixed-effects model is more accurate quantitatively and qualitatively interesting, and that these improvements are robust across different parameter settings.