The vocabulary problem in human-system communication
Communications of the ACM
Word association norms, mutual information, and lexicography
Computational Linguistics
Proceedings of the 1992 ACM/IEEE conference on Supercomputing
Independent component analysis, a new concept?
Signal Processing - Special issue on higher order statistics
Unsupervised learning
Foundations of statistical natural language processing
Foundations of statistical natural language processing
A vector space model for automatic indexing
Communications of the ACM
Neural Networks: A Comprehensive Foundation
Neural Networks: A Comprehensive Foundation
ICA and SOM in text document analysis
SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
The Journal of Machine Learning Research
Tagging English text with a probabilistic model
Computational Linguistics
TnT: a statistical part-of-speech tagger
ANLC '00 Proceedings of the sixth conference on Applied natural language processing
A stochastic parts program and noun phrase parser for unrestricted text
ANLC '88 Proceedings of the second conference on Applied natural language processing
The grammar of sense: Using part-of-speech tags as a first step in semantic disambiguation
Natural Language Engineering
Distributional part-of-speech tagging
EACL '95 Proceedings of the seventh conference on European chapter of the Association for Computational Linguistics
An efficient method for determining bilingual word classes
EACL '99 Proceedings of the ninth conference on European chapter of the Association for Computational Linguistics
Applying discrete PCA in data analysis
UAI '04 Proceedings of the 20th conference on Uncertainty in artificial intelligence
Using POS information for statistical machine translation into morphologically rich languages
EACL '03 Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 1
A simple rule-based part of speech tagger
HLT '91 Proceedings of the workshop on Speech and Natural Language
Inducing syntactic categories by context distribution clustering
ConLL '00 Proceedings of the 2nd workshop on Learning language in logic and the 4th conference on Computational natural language learning - Volume 7
Unsupervised models for morpheme segmentation and morphology learning
ACM Transactions on Speech and Language Processing (TSLP)
Prototype-driven grammar induction
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Improving LSA-based summarization with anaphora resolution
HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Prototype-driven learning for sequence models
HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
IEEE Transactions on Pattern Analysis and Machine Intelligence
Natural Image Statistics: A Probabilistic Approach to Early Computational Vision.
Natural Image Statistics: A Probabilistic Approach to Early Computational Vision.
Further meta-evaluation of machine translation
StatMT '08 Proceedings of the Third Workshop on Statistical Machine Translation
ICA'07 Proceedings of the 7th international conference on Independent component analysis and signal separation
Fast and robust fixed-point algorithms for independent component analysis
IEEE Transactions on Neural Networks
Text mining for wellbeing: selecting stories using semantic and pragmatic features
ICANN'12 Proceedings of the 22nd international conference on Artificial Neural Networks and Machine Learning - Volume Part II
Hi-index | 0.00 |
We explore the use of independent component analysis (ICA) for the automatic extraction of linguistic roles or features of words. The extraction is based on the unsupervised analysis of text corpora. We contrast ICA with singular value decomposition (SVD), widely used in statistical text analysis, in general, and specifically in latent semantic analysis (LSA). However, the representations found using the SVD analysis cannot easily be interpreted by humans. In contrast, ICA applied on word context data gives distinct features which reflect linguistic categories. In this paper, we provide justification for our approach called WordICA, present the WordICA method in detail, compare the obtained results with traditional linguistic categories and with the results achieved using an SVD-based method, and discuss the use of the method in practical natural language engineering solutions such as machine translation systems. As the WordICA method is based on unsupervised learning and thus provides a general means for efficient knowledge acquisition, we foresee that the approach has a clear potential for practical applications.