Class-based n-gram models of natural language
Computational Linguistics
Natural language parsing as statistical pattern recognition
Natural language parsing as statistical pattern recognition
COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 1
Similarity-Based Models of Word Cooccurrence Probabilities
Machine Learning - Special issue on natural language learning
The ATRACT Workbench: Automatic Term Recognition and Clustering for Terms
TSD '01 Proceedings of the 4th International Conference on Text, Speech and Dialogue
Identifying terms by their family and friends
COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
A methodology for terminology-based knowledge acquisition and integration
COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Unsupervised learning of generalized names
COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Automatic discovery of term similarities using pattern mining
COMPUTERM '02 COLING-02 on COMPUTERM 2002: second international workshop on computational terminology - Volume 14
Selecting text features for gene name classification: from documents to terms
BioMed '03 Proceedings of the ACL 2003 workshop on Natural language processing in biomedicine - Volume 13
Terminology-based knowledge mining for new knowledge discovery
ACM Transactions on Asian Language Information Processing (TALIP)
MIMA search: a structuring knowledge system towards innovation for engineering education
COLING-ACL '06 Proceedings of the COLING/ACL on Interactive presentation sessions
COLING-ACL '06 Proceedings of the COLING/ACL on Main conference poster sessions
HLT '02 Proceedings of the second international conference on Human Language Technology Research
A symbolic approach to automatic multiword term structuring
Computer Speech and Language
Consensus clustering using spectral theory
ICONIP'08 Proceedings of the 15th international conference on Advances in neuro-information processing - Volume Part I
A composite kernel for named entity recognition
Pattern Recognition Letters
Word representations: a simple and general method for semi-supervised learning
ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Journal of Biomedical Informatics
Hierarchical verb clustering using graph factorization
EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
User behaviour-driven group formation through case-based reasoning and clustering
Expert Systems with Applications: An International Journal
A hybrid heuristic for the k-medoids clustering problem
Proceedings of the 14th annual conference on Genetic and evolutionary computation
Hi-index | 0.00 |
This paper describes a data-driven method for hierarchical clustering of words in which a large vocabulary of English words is clustered bottom-up, with respect to corpora ranging in size from 5 to 50 million words, using a greedy algorithm that tries to minimize average loss of mutual information of adjacent classes. The resulting hierarchical clusters of words are then naturally transformed to a bit-string representation of (i.e. word bilts for) all the words in the vocabulary. Introducing word bits into the ATR Decision-Tree POS Tagger is shown to significantly reduce the tagging error rate. Portability of word bits from one domain to another is also disscussed.