C4.5: programs for machine learning
C4.5: programs for machine learning
SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Beyond market baskets: generalizing association rules to correlations
SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Inferring Web communities from link topology
Proceedings of the ninth ACM conference on Hypertext and hypermedia : links, objects, time and space---structure in hypermedia systems: links, objects, time and space---structure in hypermedia systems
Constructing, organizing, and visualizing collections of topically related Web resources
ACM Transactions on Computer-Human Interaction (TOCHI)
Machine learning in automated text categorization
ACM Computing Surveys (CSUR)
Web classification using support vector machine
Proceedings of the 4th international workshop on Web information and data management
A Study of Approaches to Hypertext Categorization
Journal of Intelligent Information Systems
Composite Kernels for Hypertext Categorisation
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
CMAR: Accurate and Efficient Classification Based on Multiple Class-Association Rules
ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Selecting the right interestingness measure for association patterns
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
CoBWeb A Crawler for the Brazilian Web
SPIRE '99 Proceedings of the String Processing and Information Retrieval Symposium & International Workshop on Groupware
XRules: an effective structural classifier for XML data
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Combining link-based and content-based methods for web document classification
CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
Intelligent GP fusion from multiple sources for text classification
Proceedings of the 14th ACM international conference on Information and knowledge management
When are links useful? experiments in text classification
ECIR'03 Proceedings of the 25th European conference on IR research
AAAI'96 Proceedings of the thirteenth national conference on Artificial intelligence - Volume 1
Exploiting temporal contexts in text classification
Proceedings of the 17th ACM conference on Information and knowledge management
Calibrated lazy associative classification
SBBD '08 Proceedings of the 23rd Brazilian symposium on Databases
Effective self-training author name disambiguation in scholarly digital libraries
Proceedings of the 10th annual joint conference on Digital libraries
Classifying documents with link-based bibliometric measures
Information Retrieval
Calibrated lazy associative classification
Information Sciences: an International Journal
Cost-effective on-demand associative author name disambiguation
Information Processing and Management: an International Journal
Active associative sampling for author name disambiguation
Proceedings of the 12th ACM/IEEE-CS joint conference on Digital Libraries
I-prune: Item selection for associative classification
International Journal of Intelligent Systems
SpaDeS: Detecting spammers at the source network
Computer Networks: The International Journal of Computer and Telecommunications Networking
Hi-index | 0.00 |
We present a novel approach for classifying documents that combines different pieces of evidence (e.g., textual features of documents, links, and citations) transparently, through a data mining technique which generates rules associating these pieces of evidence to predefined classes. These rules can contain any number and mixture of the available evidence and are associated with several quality criteria which can be used in conjunction to choose the "best" rule to be applied at classification time. Our method is able to perform evidence enhancement by link forwarding/backwarding (i.e., navigating among documents related through citation), so that new pieces of link-based evidence are derived when necessary. Furthermore, instead of inducing a single model (or rule set) that is good on average for all predictions, the proposed approach employs a lazy method which delays the inductive process until a document is given for classification, therefore taking advantage of better qualitative evidence coming from the document. We conducted a systematic evaluation of the proposed approach using documents from the ACM Digital Library and from a Brazilian Web directory. Our approach was able to outperform in both collections all classifiers based on the best available evidence in isolation as well as state-of-the-art multi-evidence classifiers. We also evaluated our approach using the standard WebKB collection, where our approach showed gains of 1% in accuracy, being 25 times faster. Further, our approach is extremely efficient in terms of computational performance, showing gains of more than one order of magnitude when compared against other multi-evidence classifiers.