A comparative analysis framework for semi-structured documents, with applications to government regulations

  • Authors:
  • Kincho H. Law;Gloria T. Lau

  • Affiliations:
  • -;-

  • Venue:
  • A comparative analysis framework for semi-structured documents, with applications to government regulations
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

The complexity and diversity of government regulations make understanding and retrieval of regulations a non-trivial task. One of the issues is the existence of multiple sources of regulations and interpretive guides with differences in format, terminology and context. In this work, an information infrastructure is proposed for regulation management and analysis, which includes a consolidated document repository and tools for similarity analysis. The corpus covers accessibility and environmental regulations from the US Federal government, California state government, non-profit organizations and some European agencies. The regulatory repository is to be populated with regulations in XML format. XML is chosen as the representation format because it is well suited for handling semi-structured data such as legal documents. A shallow parser is developed to consolidate regulations published in different formats, for example, PDF or HTML, into XML. The shallow parser also extracts important features, such as concepts, measurements, definitions and so on, and incorporates them into the XML structure. Having a well-formed regulatory repository, analysis tools are developed to help retrieval of related provisions from different domains of regulations. The theory and implementation of a relatedness analysis framework is presented. The goal is to identify the most strongly related provisions using not only a traditional term match but also a combination of feature matches, and not only content comparison but also structural analysis. Regulations are first compared based on conceptual information as well as domain knowledge through a combination of feature matching. Regulations also possess specific structures, such as a tree hierarchy of provisions and the referential structure. These structures represent useful information in locating related provisions, and are therefore exploited in the analysis for a complete comparison. System performance is evaluated by comparing a similarity ranking produced by users with the machine-predicted ranking. Ranking produced by the relatedness analysis system shows a reduction in error compared to that of Latent Semantic Indexing. Various pairs of regulations are compared and the results are analyzed along with observations based on different feature usages. An example of an e-rulemaking scenario is shown to demonstrate capabilities of the prototype system.