Determining the fitness of a document model by using conflict instances

  • Authors:
  • Ding-Yi Chen;Xue Li;Zhao Yang Dong;Xia Chen

  • Affiliations:
  • University of Queensland, Australia;University of Queensland, Australia;University of Queensland, Australia;University of Queensland, Australia

  • Venue:
  • ADC '05 Proceedings of the 16th Australasian database conference - Volume 39
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

Documents cannot be automatically classified unless they have been represented as a collection of computable features. A model is a representation of a document with computable features. However, a model may not be sufficient to express a document, especially when two documents have the same features, they might not be necessarily classified into the same category. We propose a method for determining the fitness of a document model by using conflict instances. Conflict instances are instances with exactly same features, but with different category labels given by human expert in an interactive document labelling process for training of the classifier. In our paper, we do not treat conflict instances as noises, but as the evidences that can reveal a distribution of positive instances. We develop an approach to the representation of this distribution information as a hyperplane, namely distribution hyperplane. Then the fitness problem becomes a problem of computing the distribution hyperplane.Besides determining the fitness of a model, distribution hyperplane can also be used for: 1) acting as classifier itself; and 2) being a membership function of fuzzy sets. In this paper, we also propose the selection criteria of effectiveness measuring for a model in a process of fitness computations.