Using Wikipedia to learn semantic feature representations of concrete concepts in neuroimaging experiments

Authors:
Francisco Pereira;Matthew Botvinick;Greg Detre
Affiliations:
Psychology Department and Princeton Neuroscience Institute, Princeton University, Princeton, NJ 08540, United States;Psychology Department and Princeton Neuroscience Institute, Princeton University, Princeton, NJ 08540, United States;Psychology Department and Princeton Neuroscience Institute, Princeton University, Princeton, NJ 08540, United States
Venue:
Artificial Intelligence
Year:
2013

Citing 12
Cited 1

Latent dirichlet allocation

The Journal of Machine Learning Research
Applied morphological processing of English

Natural Language Engineering
CLAWS4: the tagging of the British National Corpus

COLING '94 Proceedings of the 15th conference on Computational linguistics - Volume 1
Learning to Decode Cognitive States from Brain Images

Machine Learning
Word sense disambiguation: A survey

ACM Computing Surveys (CSUR)
Blockwise coordinate descent procedures for the multi-task lasso, with applications to neural semantic basis discovery

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Connecting language to the world

Artificial Intelligence - Special volume on connecting language to the world
EEG responds to conceptual stimuli and corpus semantics

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2
From frequency to meaning: vector space models of semantics

Journal of Artificial Intelligence Research
Learning semantic features for fMRI data from definitional text

CN '10 Proceedings of the NAACL HLT 2010 First Workshop on Computational Neurolinguistics
Acquiring human-like feature-based conceptual representations from corpora

CN '10 Proceedings of the NAACL HLT 2010 First Workshop on Computational Neurolinguistics
Using fMRI activation to conceptual stimuli to evaluate methods for extracting conceptual representations from corpora

CN '10 Proceedings of the NAACL HLT 2010 First Workshop on Computational Neurolinguistics

Collaboratively built semi-structured content and Artificial Intelligence: The story so far

Artificial Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we show that a corpus of a few thousand Wikipedia articles about concrete or visualizable concepts can be used to produce a low-dimensional semantic feature representation of those concepts. The purpose of such a representation is to serve as a model of the mental context of a subject during functional magnetic resonance imaging (fMRI) experiments. A recent study by Mitchell et al. (2008) [19] showed that it was possible to predict fMRI data acquired while subjects thought about a concrete concept, given a representation of those concepts in terms of semantic features obtained with human supervision. We use topic models on our corpus to learn semantic features from text in an unsupervised manner, and show that these features can outperform those in Mitchell et al. (2008) [19] in demanding 12-way and 60-way classification tasks. We also show that these features can be used to uncover similarity relations in brain activation for different concepts which parallel those relations in behavioral data from human subjects.