Adapting the naive bayes classifier to rank procedural texts

Authors:
Ling Yin;Richard Power
Affiliations:
Natural Language Technology Group (NLTG), University of Brighton, Brighton, United Kingdom;Faculty of Mathematics and Computing, The Open University, Milton Keynes, United Kingdom
Venue:
ECIR'06 Proceedings of the 28th European conference on Advances in Information Retrieval
Year:
2006

Citing 10
Cited 3

A re-examination of text categorization methods

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Data mining: practical machine learning tools and techniques with Java implementations

Data mining: practical machine learning tools and techniques with Java implementations
An Evaluation of Statistical Approaches to Text Categorization

Information Retrieval
Machine learning in automated text categorization

ACM Computing Surveys (CSUR)
The impact of corpus size on question answering performance

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Features of documents relevant to task- and fact- oriented questions

Proceedings of the eleventh international conference on Information and knowledge management
The Alternating Decision Tree Learning Algorithm

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Word selection for EBMT based on monolingual similarity and translation confidence

HLT-NAACL-PARALLEL '03 Proceedings of the HLT-NAACL 2003 Workshop on Building and using parallel texts: data driven machine translation and beyond - Volume 3
A two-stage approach to retrieving answers for how-to questions

EACL '06 Proceedings of the Eleventh Conference of the European Chapter of the Association for Computational Linguistics: Student Research Workshop
Estimating continuous distributions in Bayesian classifiers

UAI'95 Proceedings of the Eleventh conference on Uncertainty in artificial intelligence

A two-stage approach to retrieving answers for how-to questions

EACL '06 Proceedings of the Eleventh Conference of the European Chapter of the Association for Computational Linguistics: Student Research Workshop
Progress in information retrieval

ECIR'06 Proceedings of the 28th European conference on Advances in Information Retrieval
Acquisition of know-how information from web

AIRS'11 Proceedings of the 7th Asia conference on Information Retrieval Technology

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents a machine-learning approach for ranking web documents according to the proportion of procedural text they contain. By ‘procedural text' we refer to ordered lists of steps, which are very common in some instructional genres such as online manuals. Our initial training corpus is built up by applying some simple heuristics to select documents from a large collection and contains only a few documents with a large proportion of procedural texts. We adapt the Naive Bayes classifier to better fit this less than ideal training corpus. This adapted model is compared with several other classifiers in ranking procedural texts using different sets of features and is shown to perform well when only highly distinctive features are used.