Original Contribution: Stacked generalization
Neural Networks
The nature of statistical learning theory
The nature of statistical learning theory
Enhanced hypertext categorization using hyperlinks
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Inductive learning algorithms and representations for text categorization
Proceedings of the seventh international conference on Information and knowledge management
Making large-scale support vector machine learning practical
Advances in kernel methods
A re-examination of text categorization methods
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
A classifier for semi-structured documents
Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Using web structure for classifying and describing web pages
Proceedings of the 11th international conference on World Wide Web
Learning to Classify Text Using Support Vector Machines: Methods, Theory and Algorithms
Learning to Classify Text Using Support Vector Machines: Methods, Theory and Algorithms
A Tutorial on Support Vector Machines for Pattern Recognition
Data Mining and Knowledge Discovery
A Study of Approaches to Hypertext Categorization
Journal of Intelligent Information Systems
Naive (Bayes) at Forty: The Independence Assumption in Information Retrieval
ECML '98 Proceedings of the 10th European Conference on Machine Learning
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features
ECML '98 Proceedings of the 10th European Conference on Machine Learning
Challenges of the Email Domain for Text Classification
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Hypertext Categorization using Hyperlink Patterns and Meta Data
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Distributional word clusters vs. words for text categorization
The Journal of Machine Learning Research
Supervised term weighting for automated text categorization
Proceedings of the 2003 ACM symposium on Applied computing
The Combination of Text Classifiers Using Reliability Indicators
Information Retrieval
Bayesian network model for semi-structured document classification
Information Processing and Management: an International Journal - Special issue: Bayesian networks and information retrieval
Learning from little: comparison of classifiers given little training
PKDD '04 Proceedings of the 8th European Conference on Principles and Practice of Knowledge Discovery in Databases
Stacked generalization: when does it work?
IJCAI'97 Proceedings of the Fifteenth international joint conference on Artifical intelligence - Volume 2
ICIC '08 Proceedings of the 4th international conference on Intelligent Computing: Advanced Intelligent Computing Theories and Applications - with Aspects of Theoretical and Methodological Issues
New approach for field association term dictionary with passage retrieval
ACMOS'07 Proceedings of the 9th WSEAS international conference on Automatic control, modelling and simulation
Modified naïve bayes classifier for e-catalog classification
DEECS'06 Proceedings of the Second international conference on Data Engineering Issues in E-Commerce and Services
E-commerce market analysis from a graph-based product classifier
PROPOR'12 Proceedings of the 10th international conference on Computational Processing of the Portuguese Language
X-Class: Associative Classification of XML Documents by Structure
ACM Transactions on Information Systems (TOIS)
AusDM '08 Proceedings of the 7th Australasian Data Mining Conference - Volume 87
Hi-index | 0.00 |
This paper examines several different approaches to exploiting structural information in semi-structured document categorization. The methods under consideration are designed for categorization of documents consisting of a collection of fields, or arbitrary tree-structured documents that can be adequately modeled with such a flat structure. The approaches range from trivial modifications of text modeling to more elaborate schemes, specifically tailored to structured documents. We combine these methods with three different text classification algorithms and evaluate their performance on four standard datasets containing different types of semi-structured documents. The best results were obtained with stacking, an approach in which predictions based on different structural components are combined by a meta classifier. A further improvement of this method is achieved by including the flat text model in the final prediction.