Machine learning in automated text categorization
ACM Computing Surveys (CSUR)
Naive (Bayes) at Forty: The Independence Assumption in Information Retrieval
ECML '98 Proceedings of the 10th European Conference on Machine Learning
Extending a Lexical Ontology by a Combination of Distributional Semantics Signatures
EKAW '02 Proceedings of the 13th International Conference on Knowledge Engineering and Knowledge Management. Ontologies and the Semantic Web
Computational Linguistics - Special issue on using large corpora: I
Automatic retrieval and clustering of similar words
COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 2
An IR approach for translating new words from nonparallel, comparable texts
COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Automatic construction of a hypernym-labeled noun hierarchy from text
ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Taxonomy learning: factoring the structure of a taxonomy into a semantic classification decision
COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Constructing semantic space models from parsed corpora
ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Boosting automatic lexical acquisition with morphological information
ULA '02 Proceedings of the ACL-02 workshop on Unsupervised lexical acquisition - Volume 9
Extending a thesaurus with words from Pan-Chinese sources
COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
Automatic word clustering in Russian texts
TSD'07 Proceedings of the 10th international conference on Text, speech and dialogue
Hi-index | 0.00 |
The paper is concerned with automatic classification of new lexical items into synonymic sets on the basis of their cooccurrence data obtained from a corpus. Our goal is to examine the impact that different types of linguistic preprocessing of the cooccurrence material have on the classification accuracy. The paper comparatively studies several preprocessing techniques frequently used for this and similar tasks and makes conclusions about their relative merits. We find that a carefully chosen preprocessing procedure achieves a relative effectiveness improvement of up to 88% depending on the classification method in comparison to the window-based context delineation, along with using much smaller feature space.