Towards the automation of address identification

  • Authors:
  • Fernanda Morillo;Javier Aparicio;Borja González-Albo;Luz Moreno

  • Affiliations:
  • Instituto de Estudios Documentales sobre Ciencia y Tecnología (IEDCYT), Centro de Ciencias Humanas y Sociales (CCHS), Spanish National Research Council (CSIC), Madrid, Spain 28037;Instituto de Estudios Documentales sobre Ciencia y Tecnología (IEDCYT), Centro de Ciencias Humanas y Sociales (CCHS), Spanish National Research Council (CSIC), Madrid, Spain 28037;Instituto de Estudios Documentales sobre Ciencia y Tecnología (IEDCYT), Centro de Ciencias Humanas y Sociales (CCHS), Spanish National Research Council (CSIC), Madrid, Spain 28037;Instituto de Estudios Documentales sobre Ciencia y Tecnología (IEDCYT), Centro de Ciencias Humanas y Sociales (CCHS), Spanish National Research Council (CSIC), Madrid, Spain 28037

  • Venue:
  • Scientometrics
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

A new semi-automatic method is presented to standardize or codify addresses, in order to produce bibliometric indicators from bibliographic databases. The hypothesis is that this new method is very trustworthy to normalize authors' addresses, easy and quick to obtain. As a way to test the method, a set of already hand-coded data is chosen to verify its reliability: 136,821 Spanish documents (2006---2008) downloaded previously from the Web of Science database. Unique addresses from this set were selected to produce a list of keywords representing various institutional sectors. Once the list of terms is obtained, addresses are standardized with this information and the result is compared to the previous hand-coded data. Some tests are done to analyze possible association between both systems (automatic and hand-coding), calculating measures of recall and precision, and some statistical directional and symmetric measures. The outcome shows a good relation between both methods. Although these results are quite general, this overview of institutional sectors is a good way to develop a second approach for the selection of particular centers. This system has some new features because it provides a method based on the previous non-existence of master lists or tables and it has a certain impact on the automation of tasks. The validity of the hypothesis has been proved taking into account not only the statistical measures, but also considering that the obtaining of general and detailed scientific output is less time-consuming and will be even less due to the feedback of these master tables reused for the same kind of data. The same method could be used with any country and/or database creating a new master list taking into account their specific characteristics.