Determining the fitness of a document model by using conflict instances

Authors:
Ding-Yi Chen;Xue Li;Zhao Yang Dong;Xia Chen
Affiliations:
University of Queensland, Australia;University of Queensland, Australia;University of Queensland, Australia;University of Queensland, Australia
Venue:
ADC '05 Proceedings of the 16th Australasian database conference - Volume 39
Year:
2005

Citing 17
Cited 1

Models for retrieval with probabilistic indexing

Information Processing and Management: an International Journal - Modeling data, information and knowledge
Redundant noisy attributes, attribute errors, and linear-threshold learning using winnow

COLT '91 Proceedings of the fourth annual workshop on Computational learning theory
An example-based mapping method for text categorization and retrieval

ACM Transactions on Information Systems (TOIS)
Evaluating and optimizing autonomous text classification systems

SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Rough sets

Communications of the ACM
An algorithm for suffix stripping

Readings in information retrieval
Inductive learning algorithms and representations for text categorization

Proceedings of the seventh international conference on Information and knowledge management
A vector space model for automatic indexing

Communications of the ACM
Machine learning in automated text categorization

ACM Computing Surveys (CSUR)
Training v-support vector regression: theory and algorithms

Neural Computation
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features

ECML '98 Proceedings of the 10th European Conference on Machine Learning
Web Document Classification Based on Fuzzy Association

COMPSAC '02 Proceedings of the 26th International Computer Software and Applications Conference on Prolonging Software Life: Development and Redevelopment
Recovering documentation-to-source-code traceability links using latent semantic indexing

Proceedings of the 25th International Conference on Software Engineering
A fuzzy approach to classification of text documents

Journal of Computer Science and Technology
Belief revision for adaptive information retrieval

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Advances in Web-Age Information Management: 5th International Conference, WAIM 2004, Dalian, China, July 15-17, 2004, Proceedings (Lecture Notes in Computer Science)

Advances in Web-Age Information Management: 5th International Conference, WAIM 2004, Dalian, China, July 15-17, 2004, Proceedings (Lecture Notes in Computer Science)
Improved use of continuous attributes in C4.5

Journal of Artificial Intelligence Research

Effectiveness of document representation for classification

DaWaK'05 Proceedings of the 7th international conference on Data Warehousing and Knowledge Discovery

Quantified Score

Hi-index	0.00

Visualization

Abstract

Documents cannot be automatically classified unless they have been represented as a collection of computable features. A model is a representation of a document with computable features. However, a model may not be sufficient to express a document, especially when two documents have the same features, they might not be necessarily classified into the same category. We propose a method for determining the fitness of a document model by using conflict instances. Conflict instances are instances with exactly same features, but with different category labels given by human expert in an interactive document labelling process for training of the classifier. In our paper, we do not treat conflict instances as noises, but as the evidences that can reveal a distribution of positive instances. We develop an approach to the representation of this distribution information as a hyperplane, namely distribution hyperplane. Then the fitness problem becomes a problem of computing the distribution hyperplane.Besides determining the fitness of a model, distribution hyperplane can also be used for: 1) acting as classifier itself; and 2) being a membership function of fuzzy sets. In this paper, we also propose the selection criteria of effectiveness measuring for a model in a process of fitness computations.