SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
A brief survey of web data extraction tools
ACM SIGMOD Record
Visual Web Information Extraction with Lixto
Proceedings of the 27th International Conference on Very Large Data Bases
RoadRunner: Towards Automatic Data Extraction from Large Web Sites
Proceedings of the 27th International Conference on Very Large Data Bases
Data extraction and label assignment for web databases
WWW '03 Proceedings of the 12th international conference on World Wide Web
XWRAP: An XML-Enabled Wrapper Construction System for Web Information Sources
ICDE '00 Proceedings of the 16th International Conference on Data Engineering
Extracting structured data from Web pages
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Wrapper induction for information extraction
Wrapper induction for information extraction
Mining data records in Web pages
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Automatic web news extraction using tree edit distance
Proceedings of the 13th international conference on World Wide Web
Using the structure of Web sites for automatic segmentation of tables
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Automatic information extraction from large websites
Journal of the ACM (JACM)
Mining Online Deal Forums for Hot Deals
WI '04 Proceedings of the 2004 IEEE/WIC/ACM International Conference on Web Intelligence
Automating Content Extraction of HTML Documents
World Wide Web
Fully automatic wrapper generation for search engines
WWW '05 Proceedings of the 14th international conference on World Wide Web
Web data extraction based on partial tree alignment
WWW '05 Proceedings of the 14th international conference on World Wide Web
Title extraction from bodies of HTML documents and its application to web page retrieval
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Mining learning groups' activities in forum-type tools
CSCL '05 Proceedings of th 2005 conference on Computer support for collaborative learning: learning 2005: the next 10 years!
Automatic extraction of dynamic record sections from search engine result pages
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Latent Friend Mining from Blog Data
ICDM '06 Proceedings of the Sixth International Conference on Data Mining
Unsupervised Learning of Tree Alignment Models for Information Extraction
ICDMW '06 Proceedings of the Sixth IEEE International Conference on Data Mining - Workshops
Extracting Web Data Using Instance-Based Learning
World Wide Web
Context-aware wrapping: synchronized data extraction
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Mining opinion features in customer reviews
AAAI'04 Proceedings of the 19th national conference on Artifical intelligence
Extracting web data using instance-based learning
WISE'05 Proceedings of the 6th international conference on Web Information Systems Engineering
Indexing dataspaces with partitions
World Wide Web
Hi-index | 0.00 |
Extracting loosely structured data records (LSDRs) has wide applications in many domains, such as forum pattern recognition, Weblogs data analysis, and books and news review analysis. Yet currently existing methods only work well for strongly structured data records (SDRs). In this paper, we propose to address the problem of extracting LSDRs through mining strict patterns. In our method, we utilize both content feature and tag tree feature to recognize the LSDRs, and propose a new algorithm to extract the Data Records (DRs) automatically. The experimental results demonstrate that our algorithm is able to effectively extract LSDRs with higher precision and recall.