Linkage of compound objects for supporting maintenance of large-scale web sites

  • Authors:
  • Yuya Hirano;Qiang Ma;Masatoshi Yoshikawa

  • Affiliations:
  • Kyoto University, Yoshida-Honmachi, Sakyo, Kyoto, Japan;Kyoto University, Yoshida-Honmachi, Sakyo, Kyoto, Japan;Kyoto University, Yoshida-Honmachi, Sakyo, Kyoto, Japan

  • Venue:
  • Proceedings of the 8th International Conference on Ubiquitous Information Management and Communication
  • Year:
  • 2014

Quantified Score

Hi-index 0.00

Visualization

Abstract

Departments of organizations such as companies and universities tend to publish various information on their own Web sites. For example, descriptions of the members of a certain laboratory at a university may appear on the laboratory's Web site, the department's Web site, and so on. However, inconsistencies may occur between descriptions on these sites if their update timings and management policies are different. It is not easy to find such inconsistencies on large-scale Web sites, and the maintenance costs of doing so are huge. Record linkage techniques, which determine if two entities represented as relational records are approximately the same, have been developed as ways of identifying whether two entities are approximately the same. The current methods focus on simple objects, that are represented by individual records. But objects often consist of numerous simple objects; namely, they are often compound objects. For example, a research team object may contain several researcher objects. In this case, the research team object is a compound object, and the individual researcher objects are simple objects. The current record-level linkage methods can't detect such compound objects correctly when a record of one compound object doesn't match the record of the other. We propose novel methods of linking compound objects for supporting maintenance of large-scale Web sites. We first extract the relational records of Web objects by exploiting the structure of the Web pages they are on and the linguistic features of their descriptions. To find linkable compound objects that are constituted of simple objects, after the record-level linkage, we look at the compound objects' features, i.e., records continuity, common attribute values, and co-occurrences. Experimental results show that our method can detect compound objects that can't be detected by making only record-level linkages.