Algorithm for Extracting Loosely Structured Data Records Through Digging Strict Patterns

  • Authors:
  • Qing Li;Jing Chen;Yipu Wu

  • Affiliations:
  • Department of Computer Science, City University of Hong Kong, Kowloon, Hong Kong;Department of Computer Science, City University of Hong Kong, Kowloon, Hong Kong;Department of Computer Science, City University of Hong Kong, Kowloon, Hong Kong

  • Venue:
  • World Wide Web
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Extracting loosely structured data records (LSDRs) has wide applications in many domains, such as forum pattern recognition, Weblogs data analysis, and books and news review analysis. Yet currently existing methods only work well for strongly structured data records (SDRs). In this paper, we propose to address the problem of extracting LSDRs through mining strict patterns. In our method, we utilize both content feature and tag tree feature to recognize the LSDRs, and propose a new algorithm to extract the Data Records (DRs) automatically. The experimental results demonstrate that our algorithm is able to effectively extract LSDRs with higher precision and recall.