Discovering event evidence amid massive, dynamic datasets
Proceedings of the 9th annual conference companion on Genetic and evolutionary computation
Analysis of mammography reports using maximum variation sampling
Proceedings of the 10th annual conference companion on Genetic and evolutionary computation
Characterizing mammography reports for health analytics
Proceedings of the 1st ACM International Health Informatics Symposium
Data-intensive document clustering on graphics processing unit (GPU) clusters
Journal of Parallel and Distributed Computing
Incrementally maintaining classification using an RDBMS
Proceedings of the VLDB Endowment
Characterizing Mammography Reports for Health Analytics
Journal of Medical Systems
A high performance centroid-based classification approach for language identification
Pattern Recognition Letters
A Survey of Approaches to Web Service Discovery in Service-Oriented Architectures
Journal of Database Management
Redeye: a digital library for forensic document triage
Proceedings of the 13th ACM/IEEE-CS joint conference on Digital libraries
GPU enhanced parallel computing for large scale data clustering
Future Generation Computer Systems
EasySOC: Making web service outsourcing easier
Information Sciences: an International Journal
Hi-index | 0.00 |
In this paper, we propose a new term weighting scheme called Term Frequency -- Inverse Corpus Frequency (TF-ICF). It does not require term frequency information from other documents within the document collection and thus, it enables us to generate the document vectors of N streaming documents in linear time. In the context of a machine learning application, unsupervised document clustering, we evaluated the effectiveness of the proposed approach in comparison to five widely used term weighting schemes through extensive experimentation. Our results show that TF-ICF can produce document clusters that are of comparable quality as those generated by the widely recognized term weighting schemes and it is significantly faster than those methods.