Word association norms, mutual information, and lexicography
Computational Linguistics
A practical approach to feature selection
ML92 Proceedings of the ninth international workshop on Machine learning
C4.5: programs for machine learning
C4.5: programs for machine learning
Class-based n-gram models of natural language
Computational Linguistics
Estimating attributes: analysis and extensions of RELIEF
ECML-94 Proceedings of the European conference on machine learning on Machine Learning
A maximum entropy approach to natural language processing
Computational Linguistics
Machine Learning
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
A Comparative Study on Feature Selection in Text Categorization
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Improving Text Classification by Shrinkage in a Hierarchy of Classes
ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Distributional word clusters vs. words for text categorization
The Journal of Machine Learning Research
A divisive information theoretic feature clustering algorithm for text classification
The Journal of Machine Learning Research
An extensive empirical study of feature selection metrics for text classification
The Journal of Machine Learning Research
Grafting: fast, incremental feature selection by gradient descent in function space
The Journal of Machine Learning Research
Pattern Classification (2nd Edition)
Pattern Classification (2nd Edition)
ACM Transactions on Asian Language Information Processing (TALIP)
Distributional clustering of English words
ACL '93 Proceedings of the 31st annual meeting on Association for Computational Linguistics
Hierarchical clustering of words
COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 2
Feature selection, L1 vs. L2 regularization, and rotational invariance
ICML '04 Proceedings of the twenty-first international conference on Machine learning
Introducing a Family of Linear Measures for Feature Selection in Text Categorization
IEEE Transactions on Knowledge and Data Engineering
Maximum entropy models for named entity recognition
CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
An intelligent information retrieval agent
Knowledge-Based Systems
Multinomial mixture model with feature selection for text clustering
Knowledge-Based Systems
Introduction to the bio-entity recognition task at JNLPBA
JNLPBA '04 Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and its Applications
Multi-documents Automatic Abstracting based on text clustering and semantic analysis
Knowledge-Based Systems
Graph-based word clustering using a web search engine
EMNLP '06 Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
TextGraphs-1 Proceedings of the First Workshop on Graph Based Methods for Natural Language Processing
Corpus callosum MR image classification
Knowledge-Based Systems
International Journal on Document Analysis and Recognition
Large-margin feature selection for monotonic classification
Knowledge-Based Systems
Hi-index | 0.01 |
Features used for named entity recognition (NER) are often high dimensional in nature. These cause overfitting when training data is not sufficient. Dimensionality reduction leads to performance enhancement in such situations. There are a number of approaches for dimensionality reduction based on feature selection and feature extraction. In this paper we perform a comprehensive and comparative study on different dimensionality reduction approaches applied to the NER task. To compare the performance of the various approaches we consider two Indian languages namely Hindi and Bengali. NER accuracies achieved in these languages are comparatively poor as yet, primarily due to scarcity of annotated corpus. For both the languages dimensionality reduction is found to improve performance of the classifiers. A Comparative study of the effectiveness of several dimensionality reduction techniques is presented in detail in this paper.