Citation mining: integrating text mining and bibliometrics for research user profiling

  • Authors:
  • Affiliations:
  • Venue:
  • Journal of the American Society for Information Science and Technology
  • Year:
  • 2001

Quantified Score

Hi-index 0.00

Visualization

Abstract

Identifying the users and impact of research is important forresearch performers, managers, evaluators, and sponsors. It isimportant to know whether the audience reached is the audiencedesired. It is useful to understand the technical characteristicsof the other research/development/applications impacted by theoriginating research, and to understand other characteristics(names, organizations, countries) of the users impacted by theresearch. Because of the many indirect pathways through whichfundamental research can impact applications, identifying the useraudience and the research impacts can be very complex and timeconsuming. The purpose of this article is to describe a novelapproach for identifying the pathways through which research canimpact other research, technology development, and applications,and to identify the technical and infrastructure characteristics ofthe user population. A novel literature-based approach wasdeveloped to identify the user community and its characteristics.The research performed is characterized by one or more articlesaccessed by the Science Citation Index (SCI) database, because theSCI's citation-based structure enables the capability to performcitation studies easily. The user community is characterized by thearticles in the SCI that cite the original research articles, andthat cite the succeeding generations of these articles as well.Text mining is performed on the citing articles to identify thetechnical areas impacted by the research, the relationships amongthese technical areas, and relationships among the technical areasand the infrastructure (authors, journals, organizations). A keycomponent of text mining, concept clustering, was used to provideboth a taxonomy of the citing articles' technical themes andfurther technical insights based on theme relationships arisingfrom the grouping process. Bibliometrics is performed on the citingarticles to profile the user characteristics. Citation Mining, thisintegration of citation bibliometrics and text mining, is appliedto the 307 first generation citing articles of a fundamentalphysics article on the dynamics of vibrating sand-piles. Most ofthe 307 citing articles were basic research whose main themes werealigned with those of the cited article. However, about 20% of theciting articles were research or development in other disciplines,or development within the same discipline. The text mining aloneidentified the intradiscipline applications and extradisciplineimpacts and applications; this was confirmed by detailed reading ofthe 307 abstracts. The combination of citation bibliometrics andtext mining provides a synergy unavailable with each approach takenindependently. Furthermore, text mining is a REQUIREMENT for afeasible comprehensive research impact determination. Theintegrated multigeneration citation analysis required for broadresearch impact determination of highly cited articles will producethousands or tens or hundreds of thousands of citing articleAbstracts. Text mining allows the impacts of research on advanceddevelopment categories and/or extradiscipline categories to beobtained without having to read all these citing article Abstracts.The multifield bibliometrics provide multiple documentedperspectives on the users of the research, and indicate whether thedocumented audience reached is the desired target audience.