Instance pruning by filtering uninformative words: an information extraction case study

Authors:
Alfio Massimiliano Gliozzo;Claudio Giuliano;Raffaella Rinaldi
Affiliations:
Istituto per la Ricerca Scientifica e Tecnologica, ITC-irst, Trento, Italy;Istituto per la Ricerca Scientifica e Tecnologica, ITC-irst, Trento, Italy;Istituto per la Ricerca Scientifica e Tecnologica, ITC-irst, Trento, Italy
Venue:
CICLing'05 Proceedings of the 6th international conference on Computational Linguistics and Intelligent Text Processing
Year:
2005

Citing 10
Cited 1

Word sense disambiguation using a second language monolingual corpus

Computational Linguistics
Making large-scale support vector machine learning practical

Advances in kernel methods
Boosted Wrapper Induction

Proceedings of the Seventeenth National Conference on Artificial Intelligence and Twelfth Conference on Innovative Applications of Artificial Intelligence
Information Extraction with HMM Structures Learned by Stochastic Optimization

Proceedings of the Seventeenth National Conference on Artificial Intelligence and Twelfth Conference on Innovative Applications of Artificial Intelligence
Machine learning for information extraction in informal domains

Machine learning for information extraction in informal domains
Word sequence kernels

The Journal of Machine Learning Research
One sense per collocation

HLT '93 Proceedings of the workshop on Human Language Technology
Dependency tree kernels for relation extraction

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Introduction to the bio-entity recognition task at JNLPBA

JNLPBA '04 Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and its Applications
Relational learning via propositional algorithms: an information extraction case study

IJCAI'01 Proceedings of the 17th international joint conference on Artificial intelligence - Volume 2

Instance Filtering for entity recognition

ACM SIGKDD Explorations Newsletter - Natural language processing and text mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we present a novel instance pruning technique for Information Extraction (IE). In particular, our technique filters out uninformative words from texts on the basis of the assumption that very frequent words in the language do not provide any specific information about the text in which they appear, therefore their expectation of being (part of) relevant entities is very low. The experiments on two benchmark datasets show that the computation time can be significantly reduced without any significant decrease in the prediction accuracy. We also report an improvement in accuracy for one task.