A maximum entropy approach to natural language processing
Computational Linguistics
Inducing Features of Random Fields
IEEE Transactions on Pattern Analysis and Machine Intelligence
Generating finite-state transducers for semi-structured data extraction from the Web
Information Systems - Special issue on semistructured data
Learning Information Extraction Rules for Semi-Structured and Free Text
Machine Learning - Special issue on natural language learning
Relational learning of pattern-match rules for information extraction
AAAI '99/IAAI '99 Proceedings of the sixteenth national conference on Artificial intelligence and the eleventh Innovative applications of artificial intelligence conference innovative applications of artificial intelligence
Conceptual-model-based data extraction from multiple-record Web pages
Data & Knowledge Engineering
Machine Learning for Information Extraction in Informal Domains
Machine Learning - Special issue on information retrieval
Wrapper induction: efficiency and expressiveness
Artificial Intelligence - Special issue on Intelligent internet systems
DEByE - Date extraction by example
Data & Knowledge Engineering
Hierarchical Wrapper Induction for Semistructured Information Sources
Autonomous Agents and Multi-Agent Systems
Maximum Entropy Markov Models for Information Extraction and Segmentation
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
A maximum entropy approach to information extraction from semi-structured and free text
Eighteenth national conference on Artificial intelligence
Maximum entropy models for natural language ambiguity resolution
Maximum entropy models for natural language ambiguity resolution
A maximum entropy approach to named entity recognition
A maximum entropy approach to named entity recognition
PEWeb: Product Extraction from the Web Based on Entropy Estimation
WI '04 Proceedings of the 2004 IEEE/WIC/ACM International Conference on Web Intelligence
Hi-index | 0.00 |
Extracting data on the Web is an important information extraction task. Most existing approaches rely on wrappers which require human knowledge and user interaction during extraction. This paper proposes the use of conditional models as an alternative solution to this task. Deriving the strength of conditional models like maximum entropy and maximum entropy Markov models, our method offers three major advantages: the full automation, the ability to incorporate various non-independent, overlapping features of different hypertext representations, and the ability to deal with missing and disordered data fields. The experimental results on a wide range of e-commercial websites with different layouts show that our method can achieve a satisfactory trade-off between automation and accuracy, and also provide a practical application of automated data extraction from the Web.