Learning combination features with L1 regularization

Authors:
Daisuke Okanohara;Jun'ichi Tsujii
Affiliations:
University of Tokyo, Bunkyo-ku, Tokyo, Japan;University of Tokyo, Bunkyo-ku, Tokyo, Japan and University of Manchester and NaCTeM (National Center for Text Mining)
Venue:
NAACL-Short '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Short Papers
Year:
2009

Citing 5
Cited 3

Parameterized generation of labeled datasets for text categorization based on a hierarchical directory

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Approximation lasso methods for language modeling

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Linear-time dependency analysis for Japanese

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Maximum Entropy Density Estimation with Generalized Regularization and an Application to Species Distribution Modeling

The Journal of Machine Learning Research
Mining complex genotypic features for predicting HIV-1 drug resistance

Bioinformatics

Polynomial to linear: efficient classification with conjunctive features

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 3 - Volume 3
Kernel slicing: scalable online training with conjunctive features

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Classifying dialogue in high-dimensional space

ACM Transactions on Speech and Language Processing (TSLP)

Quantified Score

Hi-index	0.00

Visualization

Abstract

When linear classifiers cannot successfully classify data, we often add combination features, which are products of several original features. The searching for effective combination features, namely feature engineering, requires domain-specific knowledge and hard work. We present herein an efficient algorithm for learning an L1 regularized logistic regression model with combination features. We propose to use the grafting algorithm with efficient computation of gradients. This enables us to find optimal weights efficiently without enumerating all combination features. By using L1 regularization, the result we obtain is very compact and achieves very efficient inference. In experiments with NLP tasks, we show that the proposed method can extract effective combination features, and achieve high performance with very few features.