PicAChoo: a tool for customizable feature extraction utilizing characteristics of textual data

Authors:
Jaeseok Myung;Jung-Yeon Yang;Sang-goo Lee
Affiliations:
Seoul National University, Seoul, Republic of Korea;Seoul National University, Seoul, Republic of Korea;Seoul National University, Seoul, Republic of Korea
Venue:
Proceedings of the 3rd International Conference on Ubiquitous Information Management and Communication
Year:
2009

Citing 6
Cited 0

Context-sensitive learning methods for text categorization

ACM Transactions on Information Systems (TOIS)
A Comparative Study on Feature Selection in Text Categorization

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
GATE: a General Architecture for Text Engineering

COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 2
Feature selection and feature extraction for text categorization

HLT '91 Proceedings of the workshop on Speech and Natural Language
YALE: rapid prototyping for complex data mining tasks

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
A novel feature selection algorithm for text categorization

Expert Systems with Applications: An International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

Although documents have hundreds of thousands of unique words, only a small number of words are significantly useful for intelligent services. For this reason, feature extraction has become an important issue to be addressed in various fields, such as information retrieval, text mining, pattern recognition, etc. Numerous supporting tools for feature extraction are available, but most of them deal with text as a simple literal. Unfortunately, text is not just a literal, but a semantically significant unit including linguistic characteristics. So, we need customized extraction methods that consider the characteristics of source documents. PicAChoo stands for 'Pick And Choose', and it provides an environment which enables feature extraction methods using the structure of sentences and the part-of-speech information of words. Moreover, we suggest dynamic composition of different extraction methods without hard-coding.