Probabilistic reasoning in intelligent systems: networks of plausible inference
Probabilistic reasoning in intelligent systems: networks of plausible inference
Automatic segmentation of text into structured records
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Transductive Inference for Text Classification using Support Vector Machines
ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Information Extraction with HMM Structures Learned by Stochastic Optimization
Proceedings of the Seventeenth National Conference on Artificial Intelligence and Twelfth Conference on Innovative Applications of Artificial Intelligence
Mining reference tables for automatic text segmentation
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Integrating Unstructured Data into Relational Databases
ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Answering Imprecise Queries over Autonomous Web Databases
ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Information extraction from research papers using conditional random fields
Information Processing and Management: an International Journal
Information Processing and Management: an International Journal
FLUX-CIM: flexible unsupervised extraction of citation metadata
Proceedings of the 7th ACM/IEEE-CS joint conference on Digital libraries
Context-aware wrapping: synchronized data extraction
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Foundations and Trends in Databases
A flexible approach for extracting metadata from bibliographic citations
Journal of the American Society for Information Science and Technology
Reinforcement learning: a survey
Journal of Artificial Intelligence Research
Unsupervised strategies for information extraction by text segmentation
Proceedings of the Fourth SIGMOD PhD Workshop on Innovative Database Research
A probabilistic approach for automatically filling form-based web interfaces
Proceedings of the VLDB Endowment
Harvesting relational tables from lists on the web
The VLDB Journal — The International Journal on Very Large Data Bases
Joint unsupervised structure discovery and information extraction
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Semi-supervised multi-task learning of structured prediction models for web information extraction
Proceedings of the 20th ACM international conference on Information and knowledge management
Web-based closed-domain data extraction on online advertisements
Information Systems
ADC '13 Proceedings of the Twenty-Fourth Australasian Database Conference - Volume 137
Hi-index | 0.00 |
Information extraction by text segmentation (IETS) applies to cases in which data values of interest are organized in implicit semi-structured records available in textual sources (e.g. postal addresses, bibliographic information, ads). It is an important practical problem that has been frequently addressed in the recent literature. In this paper we introduce ONDUX (On Demand Unsupervised Information Extraction), a new unsupervised probabilistic approach for IETS. As other unsupervised IETS approaches, ONDUX relies on information available on pre-existing data to associate segments in the input string with attributes of a given domain. Unlike other approaches, we rely on very effective matching strategies instead of explicit learning strategies. The effectiveness of this matching strategy is also exploited to disambiguate the extraction of certain attributes through a reinforcement step that explores sequencing and positioning of attribute values directly learned on-demand from test data, with no previous human-driven training, a feature unique to ONDUX. This assigns to ONDUX a high degree of flexibility and results in superior effectiveness, as demonstrated by the experimental evaluation we report with textual sources from different domains, in which ONDUX is compared with a state-of-art IETS approach.