Models for retrieval with probabilistic indexing
Information Processing and Management: an International Journal - Modeling data, information and knowledge
Redundant noisy attributes, attribute errors, and linear-threshold learning using winnow
COLT '91 Proceedings of the fourth annual workshop on Computational learning theory
An example-based mapping method for text categorization and retrieval
ACM Transactions on Information Systems (TOIS)
Evaluating and optimizing autonomous text classification systems
SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Communications of the ACM
An algorithm for suffix stripping
Readings in information retrieval
Inductive learning algorithms and representations for text categorization
Proceedings of the seventh international conference on Information and knowledge management
A vector space model for automatic indexing
Communications of the ACM
Machine learning in automated text categorization
ACM Computing Surveys (CSUR)
Training v-support vector regression: theory and algorithms
Neural Computation
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features
ECML '98 Proceedings of the 10th European Conference on Machine Learning
Web Document Classification Based on Fuzzy Association
COMPSAC '02 Proceedings of the 26th International Computer Software and Applications Conference on Prolonging Software Life: Development and Redevelopment
Recovering documentation-to-source-code traceability links using latent semantic indexing
Proceedings of the 25th International Conference on Software Engineering
A fuzzy approach to classification of text documents
Journal of Computer Science and Technology
Belief revision for adaptive information retrieval
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Advances in Web-Age Information Management: 5th International Conference, WAIM 2004, Dalian, China, July 15-17, 2004, Proceedings (Lecture Notes in Computer Science)
Improved use of continuous attributes in C4.5
Journal of Artificial Intelligence Research
Effectiveness of document representation for classification
DaWaK'05 Proceedings of the 7th international conference on Data Warehousing and Knowledge Discovery
Hi-index | 0.00 |
Documents cannot be automatically classified unless they have been represented as a collection of computable features. A model is a representation of a document with computable features. However, a model may not be sufficient to express a document, especially when two documents have the same features, they might not be necessarily classified into the same category. We propose a method for determining the fitness of a document model by using conflict instances. Conflict instances are instances with exactly same features, but with different category labels given by human expert in an interactive document labelling process for training of the classifier. In our paper, we do not treat conflict instances as noises, but as the evidences that can reveal a distribution of positive instances. We develop an approach to the representation of this distribution information as a hyperplane, namely distribution hyperplane. Then the fitness problem becomes a problem of computing the distribution hyperplane.Besides determining the fitness of a model, distribution hyperplane can also be used for: 1) acting as classifier itself; and 2) being a membership function of fuzzy sets. In this paper, we also propose the selection criteria of effectiveness measuring for a model in a process of fitness computations.