The HiLeX system for semantic information extraction

Authors:
Marco Manna;Ermelinda Oro;Massimo Ruffolo;Mario Alviano;Nicola Leone
Affiliations:
Department of Mathematics, University of Calabria, Italy;DEIS, University of Calabria, Italy;ICAR-CNR, University of Calabria, Italy;Department of Mathematics, University of Calabria, Italy;Department of Mathematics, University of Calabria, Italy
Venue:
Transactions on Large-Scale Data- and Knowledge-Centered Systems V
Year:
2012

Citing 52
Cited 3

Complexity characterizations of attribute Grammar languages

Information and Computation
A translation approach to portable ontology specifications

Knowledge Acquisition - Special issue: Current issues in knowledge modeling
Toward principles for the design of ontologies used for knowledge sharing

International Journal of Human-Computer Studies - Special issue: the role of formal ontology in the information technology
Template-based wrappers in the TSIMMIS system

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
NoDoSE—a tool for semi-automatically extracting structured and semistructured data from text documents

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
The Araneus Web-based management system

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Information extraction from HTML: application of a general machine learning approach

AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
A hierarchical approach to wrapper induction

Proceedings of the third annual conference on Autonomous Agents
Record-boundary discovery in Web documents

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Generating finite-state transducers for semi-structured data extraction from the Web

Information Systems - Special issue on semistructured data
Grammars have exceptions

Information Systems - Special issue on semistructured data
Learning Information Extraction Rules for Semi-Structured and Free Text

Machine Learning - Special issue on natural language learning
Relational learning of pattern-match rules for information extraction

AAAI '99/IAAI '99 Proceedings of the sixteenth national conference on Artificial intelligence and the eleventh Innovative applications of artificial intelligence conference innovative applications of artificial intelligence
Extracting semi-structured data through examples

Proceedings of the eighth international conference on Information and knowledge management
WebOQL: restructuring documents, databases, and webs

Theory and Practice of Object Systems
Conceptual-model-based data extraction from multiple-record Web pages

Data & Knowledge Engineering
Snowball: extracting relations from large plain-text collections

DL '00 Proceedings of the fifth ACM conference on Digital libraries
Machine Learning for Information Extraction in Informal Domains

Machine Learning - Special issue on information retrieval
Wrapper induction: efficiency and expressiveness

Artificial Intelligence - Special issue on Intelligent internet systems
Building intelligent web applications using lightweight wrappers

Data & Knowledge Engineering - Special issue on heterogeneous information resources need semantic access
A brief survey of web data extraction tools

ACM SIGMOD Record
DEByE - Date extraction by example

Data & Knowledge Engineering
Hierarchical Wrapper Induction for Semistructured Information Sources

Autonomous Agents and Multi-Agent Systems
Automatic information extraction from semi-structured Web pages by pattern discovery

Decision Support Systems - Web retrieval and mining
RoadRunner: Towards Automatic Data Extraction from Large Web Sites

Proceedings of the 27th International Conference on Very Large Data Bases
Extracting Patterns and Relations from the World Wide Web

WebDB '98 Selected papers from the International Workshop on The World Wide Web and Databases
A Comparative Study of Information Extraction Strategies

CICLing '02 Proceedings of the Third International Conference on Computational Linguistics and Intelligent Text Processing
The T-Recs Table Recognition and Analysis System

DAS '98 Selected Papers from the Third IAPR Workshop on Document Analysis Systems: Theory and Practice
Toolkits for Generating Wrappers

NODe '02 Revised Papers from the International Conference NetObjectDays on Objects, Components, Architectures, Services, and Applications for a Networked World
XWRAP: An XML-Enabled Wrapper Construction System for Web Information Sources

ICDE '00 Proceedings of the 16th International Conference on Data Engineering
Web-scale information extraction in knowitall: (preliminary results)

Proceedings of the 13th international conference on World Wide Web
Towards the self-annotating web

Proceedings of the 13th international conference on World Wide Web
Toward semantic understanding: an approach based on information extraction ontologies

ADC '04 Proceedings of the 15th Australasian database conference - Volume 27
Automatic information extraction from large websites

Journal of the ACM (JACM)
A survey of table recognition: Models, observations, transformations, and inferences

International Journal on Document Analysis and Recognition
Mining Web Pages for Data Records

IEEE Intelligent Systems
TEG—a hybrid approach to information extraction

Knowledge and Information Systems
The DLV system for knowledge representation and reasoning

ACM Transactions on Computational Logic (TOCL)
A Survey of Web Information Extraction Systems

IEEE Transactions on Knowledge and Data Engineering
Interactive learning of node selecting tree transducer

Machine Learning
Transforming arbitrary tables into logical form with TARTAR

Data & Knowledge Engineering
Semantic Clinical Process Management

CBMS '07 Proceedings of the Twentieth IEEE International Symposium on Computer-Based Medical Systems
Table Recognition and Understanding from PDF Files

ICDAR '07 Proceedings of the Ninth International Conference on Document Analysis and Recognition - Volume 02
Autonomously semantifying wikipedia

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
An ASP-Based System for e-Tourism

LPNMR '09 Proceedings of the 10th International Conference on Logic Programming and Nonmonotonic Reasoning
Open information extraction from the web

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
From tables to frames

Web Semantics: Science, Services and Agents on the World Wide Web
Adaptive information extraction: core technologies for information agents

Intelligent information agents
On the complexity of regular-grammars with integer attributes

Journal of Computer and System Sciences
A Logic-Based System for e-Tourism

Fundamenta Informaticae - On the Italian Conference on Computational Logic: CILC 2009
Notes on contemporary table recognition

DAS'06 Proceedings of the 7th international conference on Document Analysis Systems
Semistructured data: the TSIMMIS experience

ADBIS'97 Proceedings of the First East-European conference on Advances in Databases and Information systems

Unfounded sets and well-founded semantics of answer set programs with aggregates

Journal of Artificial Intelligence Research
Magic Sets for disjunctive Datalog programs

Artificial Intelligence
Look-back Techniques for ASP Programs with Aggregates

Fundamenta Informaticae

Quantified Score

Hi-index	0.00

Visualization

Abstract

The explosive growth and popularity of the Web has resulted in a huge amount of digital information sources on the Internet. Unfortunately, such sources only manage data, rather than the knowledge they carry. Recognizing, extracting, and structuring relevant information according to their semantics is a crucial task. Several approaches in the field of Information Extraction (IE) have been proposed to support the translation of semi-structured/unstructured documents into structured data or knowledge. Most of them have a high precision but, since they are mainly syntactic, they often have a low recall, are dependent on the document format, and ignore the semantics of information they extract. In this paper, we describe a new approach for semantic information extraction that could represent the basis for automatically extracting highly structured data from unstructured web sources without any undesirable trade-off between precision and recall. In short, the approach (i) is ontology driven, (ii) is based on a unified representation of documents, (iii) integrates existing IE techniques, (iv) implements semantic regular expressions, (v) has been implemented through Answer Set Programming, (vi) is employed in real-world applications, and (vii) is having a positive feedback from business customers.