Network flows: theory, algorithms, and applications
Network flows: theory, algorithms, and applications
Selection of relevant features and examples in machine learning
Artificial Intelligence - Special issue on relevance
Wrappers for feature subset selection
Artificial Intelligence - Special issue on relevance
Mining generalized association rules
Future Generation Computer Systems - Special double issue on data mining
Using Feature Hierarchies in Bayesian Network Learning
SARA '02 Proceedings of the 4th International Symposium on Abstraction, Reformulation, and Approximation
Combinatorial feature selection problems
FOCS '00 Proceedings of the 41st Annual Symposium on Foundations of Computer Science
Using Category-Based Adherence to Cluster Market-Basket Data
ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
Learning accurate and concise naïve Bayes classifiers from attribute value taxonomies and data
Knowledge and Information Systems
Exploiting known taxonomies in learning overlapping concepts
IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Relevancy in constraint-based subgroup discovery
Proceedings of the 2004 European conference on Constraint-Based Mining and Inductive Databases
Using ontologies in semantic data mining with SEGS and g-SEGS
DS'11 Proceedings of the 14th international conference on Discovery science
Hi-index | 0.00 |
Taxonomies for a set of features occur in many real-world domains. An example is provided by paleontology, where the task is to determine the age of a fossil site on the basis of the taxa that have been found in it. As the fossil record is very noisy and there are lots of gaps in it, the challenge is to consider taxa at a suitable level of aggregation: species, genus, family, etc. For example, some species can be very suitable as features for the age prediction task, while for other parts of the taxonomy it would be better to use genus level or even higher levels of the hierarchy. A default choice is to select a fixed level (typically species or genus); this misses the potential gain of choosing the proper level for sets of species separately. Motivated by this application we study the problem of selecting an antichain from a taxonomy that covers all leaves and helps to predict better a specified target variable. Our experiments on paleontological data show that choosing antichains leads to better predictions than fixing specific levels of the taxonomy beforehand.