On the complexity of learning strings and sequences
Theoretical Computer Science
On finding minimal, maximal, and consistent sequences over a binary alphabet
Theoretical Computer Science
Template-based wrappers in the TSIMMIS system
SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Learning to Understand Information on the Internet: AnExample-Based Approach
Journal of Intelligent Information Systems - Special issue: next generation information technologies and systems
Wrapper generation for semi-structured Internet sources
ACM SIGMOD Record
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Ontology-based extraction and structuring of information from data-rich unstructured documents
Proceedings of the seventh international conference on Information and knowledge management
A hierarchical approach to wrapper induction
Proceedings of the third annual conference on Autonomous Agents
Record-boundary discovery in Web documents
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Generating finite-state transducers for semi-structured data extraction from the Web
Information Systems - Special issue on semistructured data
Learning Information Extraction Rules for Semi-Structured and Free Text
Machine Learning - Special issue on natural language learning
The Complexity of Some Problems on Subsequences and Supersequences
Journal of the ACM (JACM)
A flexible learning system for wrapping tables and lists in HTML documents
Proceedings of the 11th international conference on World Wide Web
RoadRunner: Towards Automatic Data Extraction from Large Web Sites
Proceedings of the 27th International Conference on Very Large Data Bases
Extracting structured data from Web pages
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
On the complexity of schema inference from web pages in the presence of nullable data attributes
CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
On the complexity of schema inference from web pages in the presence of nullable data attributes
CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
Hearsay: enabling audio browsing on hypertext content
Proceedings of the 13th international conference on World Wide Web
Hi-index | 0.00 |
Machine learning techniques for data extraction fromsemistructured sources exhibit different precision and recallcharacteristics. However to date the formal relationship betweenlearning algorithms and their impact on these twometrics remains unexplored. This paper proposes a formalizationof precision and recall of extraction and investigatesthe complexity-theoretic aspects of learning algorithms formulti-attribute data extraction based on this formalism. Weshow that there is a tradeoff between precision/recall of extractionand computational efficiency and present experimentalresults to demonstrate the practical utility of theseconcepts in designing scalable data extraction algorithmsfor improving recall without compromising on precision.