A new challenge for compression algorithms: genetic sequences
Information Processing and Management: an International Journal - Special issue: data compression
Topic-conditioned novelty detection
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
A System for new event detection
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Towards parameter-free data mining
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
ROCR: visualizing classifier performance in R
Bioinformatics
Using names and topics for new event detection
HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
IEEE Transactions on Information Theory
IEEE Transactions on Information Theory
IEEE Transactions on Information Theory
Hi-index | 0.00 |
Reviewing a patient history can be very time consuming, partly because of the large number of consultation notes. Often, most of the notes contain little new information. Tools facilitating this and other tasks could be constructed if we had the ability to automatically detect the novel notes. We propose the use of measures based on text compression, as an approximation of Kolmogorov complexity, for classifying note novelty. We define four compression-based and eight other measures. We evaluate their ability to predict the presence of previously unseen diagnosis codes associated with the notes in patient histories from general practice. The best measures show promising classification ability, which, while not enough to serve alone as a clinical tool, might be useful as part of a system taking more data types into account. The best individual measure was the normalized asymmetric compression distance between the concatenated prior notes and the current note.