Multi-evidence, multi-criteria, lazy associative document classification

Authors:
Adriano Veloso;Wagner Meira, Jr.;Marco Cristo;Marcos Gonçalves;Mohammed Zaki
Affiliations:
Federal University of Minas Gerais, Belo Horizonte, Brazil;Federal University of Minas Gerais, Belo Horizonte, Brazil;Federal University of Minas Gerais, Belo Horizonte, Brazil;Federal University of Minas Gerais, Belo Horizonte, Brazil;Rensselaer Polytechnic Institute, Troy
Venue:
CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Year:
2006

Citing 17
Cited 10

C4.5: programs for machine learning

C4.5: programs for machine learning
Expert network: effective and efficient learning from human decisions in text categorization and retrieval

SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Beyond market baskets: generalizing association rules to correlations

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Inferring Web communities from link topology

Proceedings of the ninth ACM conference on Hypertext and hypermedia : links, objects, time and space---structure in hypermedia systems: links, objects, time and space---structure in hypermedia systems
Constructing, organizing, and visualizing collections of topically related Web resources

ACM Transactions on Computer-Human Interaction (TOCHI)
Machine learning in automated text categorization

ACM Computing Surveys (CSUR)
Web classification using support vector machine

Proceedings of the 4th international workshop on Web information and data management
A Study of Approaches to Hypertext Categorization

Journal of Intelligent Information Systems
Composite Kernels for Hypertext Categorisation

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
CMAR: Accurate and Efficient Classification Based on Multiple Class-Association Rules

ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Selecting the right interestingness measure for association patterns

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
CoBWeb A Crawler for the Brazilian Web

SPIRE '99 Proceedings of the String Processing and Information Retrieval Symposium & International Workshop on Groupware
XRules: an effective structural classifier for XML data

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Combining link-based and content-based methods for web document classification

CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
Intelligent GP fusion from multiple sources for text classification

Proceedings of the 14th ACM international conference on Information and knowledge management
When are links useful? experiments in text classification

ECIR'03 Proceedings of the 25th European conference on IR research
Lazy decision trees

AAAI'96 Proceedings of the thirteenth national conference on Artificial intelligence - Volume 1

Exploiting temporal contexts in text classification

Proceedings of the 17th ACM conference on Information and knowledge management
Calibrated lazy associative classification

SBBD '08 Proceedings of the 23rd Brazilian symposium on Databases
Effective self-training author name disambiguation in scholarly digital libraries

Proceedings of the 10th annual joint conference on Digital libraries
Classifying documents with link-based bibliometric measures

Information Retrieval
Calibrated lazy associative classification

Information Sciences: an International Journal
Cost-effective on-demand associative author name disambiguation

Information Processing and Management: an International Journal
Active associative sampling for author name disambiguation

Proceedings of the 12th ACM/IEEE-CS joint conference on Digital Libraries
I-prune: Item selection for associative classification

International Journal of Intelligent Systems
SpaDeS: Detecting spammers at the source network

Computer Networks: The International Journal of Computer and Telecommunications Networking
Certainty-based active learning for sampling imbalanced datasets

Neurocomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present a novel approach for classifying documents that combines different pieces of evidence (e.g., textual features of documents, links, and citations) transparently, through a data mining technique which generates rules associating these pieces of evidence to predefined classes. These rules can contain any number and mixture of the available evidence and are associated with several quality criteria which can be used in conjunction to choose the "best" rule to be applied at classification time. Our method is able to perform evidence enhancement by link forwarding/backwarding (i.e., navigating among documents related through citation), so that new pieces of link-based evidence are derived when necessary. Furthermore, instead of inducing a single model (or rule set) that is good on average for all predictions, the proposed approach employs a lazy method which delays the inductive process until a document is given for classification, therefore taking advantage of better qualitative evidence coming from the document. We conducted a systematic evaluation of the proposed approach using documents from the ACM Digital Library and from a Brazilian Web directory. Our approach was able to outperform in both collections all classifiers based on the best available evidence in isolation as well as state-of-the-art multi-evidence classifiers. We also evaluated our approach using the standard WebKB collection, where our approach showed gains of 1% in accuracy, being 25 times faster. Further, our approach is extremely efficient in terms of computational performance, showing gains of more than one order of magnitude when compared against other multi-evidence classifiers.