AI Magazine
AI Expert
Computer systems that learn: classification and prediction methods from statistics, neural nets, machine learning, and expert systems
Software Engineering
Software Engineering: A Practitioner's Approach
Software Engineering: A Practitioner's Approach
Evaluating Performance and Quality of Knowledge-Based Systems: Foundation and Methodology
IEEE Transactions on Knowledge and Data Engineering
A Hybrid Intelligent System for the Preprocessing of Fetal Heart Rate Signals in Antenatal Testing
IWANN '97 Proceedings of the International Work-Conference on Artificial and Natural Neural Networks: Biological and Artificial Computation: From Neuroscience to Technology
Cluster Analysis
Paper: A methodology for evaluation of knowledge-based systems in medicine
Artificial Intelligence in Medicine
Design and evaluation of an intelligent decision support system for nuclear emergencies
Decision Support Systems
Design and evaluation of an intelligent decision support system for nuclear emergencies
Decision Support Systems
Hi-index | 0.00 |
The validation of a software product is a fundamental part of its development, and focuses on an analysis of whether the software correctly resolves the problems it was designed to tackle. Traditional approaches to validation are based on a comparison of results with what is called a gold standard. Nevertheless, in certain domains, it is not always easy or even possible to establish such a standard. This is the case of intelligent systems that endeavour to simulate or emulate a model of expert behaviour. This article describes the validation of the intelligent system computer-aided foetal evaluator (CAFE), developed for intelligent monitoring of the antenatal condition based on data from the non-stress test (NST), and how this validation was accomplished through a methodology designed to resolve the problem of the validation of intelligent systems. System performance was compared to that of three obstetricians using 3450min of cardiotocographic (CTG) records corresponding to 53 different patients. From these records different parameters were extracted and interpreted, and thus, the validation was carried out on a parameter-by-parameter basis using measurement techniques such as percentage agreement, the Kappa statistic or cluster analysis. Results showed that the system's agreement with the experts is, in general, similar to agreement between the experts themselves which, in turn, permits our system to be considered at least as skilful as our experts. Throughout our article, the results obtained are commented on with a view to demonstrating how the utilisation of different measures of the level of agreement existing between system and experts can assist not only in assessing the aptness of a system, but also in highlighting its weaknesses. This kind of assessment means that the system can be fine-tuned repeatedly to the point where the expected results are obtained.