Wordica: Emergence of linguistic representations for words by independent component analysis

Authors:
Timo Honkela;Aapo HyvÄrinen;Jaakko j. VÄyrynen
Affiliations:
Adaptive informatics research centre, aalto university school of science and technology, p.o. box 15400, fi-00076 aalto, finland e-mail: timo.honkela@tkk.fi;Department of mathematics and statistics, department of computer science, university of helsinki, p.o. box 68, fi-00014 university of helsinki, finland and helsinki institute for information techn ...;Adaptive informatics research centre, aalto university school of science and technology, p.o. box 15400, fi-00076 aalto, finland
Venue:
Natural Language Engineering
Year:
2010

Citing 30
Cited 1

The vocabulary problem in human-system communication

Communications of the ACM
Word association norms, mutual information, and lexicography

Computational Linguistics
Blind separation of sources, Part 1: an adaptive algorithm based on neuromimetic architecture

Signal Processing
Dimensions of meaning

Proceedings of the 1992 ACM/IEEE conference on Supercomputing
Independent component analysis, a new concept?

Signal Processing - Special issue on higher order statistics
Unsupervised learning

Unsupervised learning
Foundations of statistical natural language processing

Foundations of statistical natural language processing
A vector space model for automatic indexing

Communications of the ACM
Neural Networks: A Comprehensive Foundation

Neural Networks: A Comprehensive Foundation
ICA and SOM in text document analysis

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Latent dirichlet allocation

The Journal of Machine Learning Research
Tagging English text with a probabilistic model

Computational Linguistics
TnT: a statistical part-of-speech tagger

ANLC '00 Proceedings of the sixth conference on Applied natural language processing
A stochastic parts program and noun phrase parser for unrestricted text

ANLC '88 Proceedings of the second conference on Applied natural language processing
The grammar of sense: Using part-of-speech tags as a first step in semantic disambiguation

Natural Language Engineering
Distributional part-of-speech tagging

EACL '95 Proceedings of the seventh conference on European chapter of the Association for Computational Linguistics
An efficient method for determining bilingual word classes

EACL '99 Proceedings of the ninth conference on European chapter of the Association for Computational Linguistics
Applying discrete PCA in data analysis

UAI '04 Proceedings of the 20th conference on Uncertainty in artificial intelligence
Using POS information for statistical machine translation into morphologically rich languages

EACL '03 Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 1
A simple rule-based part of speech tagger

HLT '91 Proceedings of the workshop on Speech and Natural Language
Inducing syntactic categories by context distribution clustering

ConLL '00 Proceedings of the 2nd workshop on Learning language in logic and the 4th conference on Computational natural language learning - Volume 7
Unsupervised models for morpheme segmentation and morphology learning

ACM Transactions on Speech and Language Processing (TSLP)
Prototype-driven grammar induction

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Improving LSA-based summarization with anaphora resolution

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Prototype-driven learning for sequence models

HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
Equivalence of Some Common Linear Feature Extraction Techniques for Appearance-Based Object Recognition Tasks

IEEE Transactions on Pattern Analysis and Machine Intelligence
Natural Image Statistics: A Probabilistic Approach to Early Computational Vision.

Natural Image Statistics: A Probabilistic Approach to Early Computational Vision.
Further meta-evaluation of machine translation

StatMT '08 Proceedings of the Third Workshop on Statistical Machine Translation
Specific circumstances on the ability of linguistic feature extraction based on context preprocessing by ICA

ICA'07 Proceedings of the 7th international conference on Independent component analysis and signal separation
Fast and robust fixed-point algorithms for independent component analysis

IEEE Transactions on Neural Networks

Text mining for wellbeing: selecting stories using semantic and pragmatic features

ICANN'12 Proceedings of the 22nd international conference on Artificial Neural Networks and Machine Learning - Volume Part II

Quantified Score

Hi-index	0.00

Visualization

Abstract

We explore the use of independent component analysis (ICA) for the automatic extraction of linguistic roles or features of words. The extraction is based on the unsupervised analysis of text corpora. We contrast ICA with singular value decomposition (SVD), widely used in statistical text analysis, in general, and specifically in latent semantic analysis (LSA). However, the representations found using the SVD analysis cannot easily be interpreted by humans. In contrast, ICA applied on word context data gives distinct features which reflect linguistic categories. In this paper, we provide justification for our approach called WordICA, present the WordICA method in detail, compare the obtained results with traditional linguistic categories and with the results achieved using an SVD-based method, and discuss the use of the method in practical natural language engineering solutions such as machine translation systems. As the WordICA method is based on unsupervised learning and thus provides a general means for efficient knowledge acquisition, we foresee that the approach has a clear potential for practical applications.