A vector space model for automatic indexing
Communications of the ACM
Journal of the American Society for Information Science and Technology
AusDM '06 Proceedings of the fifth Australasian conference on Data mining and analystics - Volume 61
RankClus: integrating clustering with ranking for heterogeneous information network analysis
Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Ranking-based clustering of heterogeneous information networks with star network schema
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Hi-index | 0.00 |
The ranking and clustering of publication databases are often used to discover useful information about research areas. NetClus is an iterative algorithm for clustering heterogenous star-schema information network that incorporates the ranking information of individual data types. The algorithm has been evaluated using the DBLP database. In this paper, we apply NetClus on PubMed, a free database of articles on life sciences and biomedical topics to discover key aspects of cancer research. The absence of unique identifiers for authors in PubMed introduces additional challenges. To address this, we introduce an improved author disambiguation technique using affiliation string normalisation based on vector space model together with co-author networks. Our technique for disambiguating authors, which offers a higher accuracy than existing techniques, significantly improves NetClus clustering results.