Learnability and the Vapnik-Chervonenkis dimension
Journal of the ACM (JACM)
The anatomy of a large-scale hypertextual Web search engine
WWW7 Proceedings of the seventh international conference on World Wide Web 7
Focused crawling: a new approach to topic-specific Web resource discovery
WWW '99 Proceedings of the eighth international conference on World Wide Web
Symbol Recognition by Error-Tolerant Subgraph Matching between Region Adjacency Graphs
IEEE Transactions on Pattern Analysis and Machine Intelligence - Graph Algorithms and Computer Vision
Visual Web Information Extraction with Lixto
Proceedings of the 27th International Conference on Very Large Data Bases
Monadic datalog and the expressive power of languages for Web information extraction
Journal of the ACM (JACM)
Toward semantic understanding: an approach based on information extraction ontologies
ADC '04 Proceedings of the 15th Australasian database conference - Volume 27
The Lixto data extraction project: back and forth between theory and practice
PODS '04 Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Efficient algorithms for processing XPath queries
ACM Transactions on Database Systems (TODS)
Using graph matching techniques to wrap data from PDF documents
Proceedings of the 15th international conference on World Wide Web
A formal comparison of visual web wrapper generators
SOFSEM'06 Proceedings of the 32nd conference on Current Trends in Theory and Practice of Computer Science
Detecting data records in semi-structured web sites based on text token clustering
Integrated Computer-Aided Engineering
Towards a System for Ontology-Based Information Extraction from PDF Documents
OTM '08 Proceedings of the OTM 2008 Confederated International Conferences, CoopIS, DOA, GADA, IS, and ODBASE 2008. Part II on On the Move to Meaningful Internet Systems
Automatic data record detection in web pages
KSEM'07 Proceedings of the 2nd international conference on Knowledge science, engineering and management
From information to knowledge: harvesting entities and relationships from web sources
Proceedings of the twenty-ninth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Encapsulating multi-stepped web forms as web services
ICSOC/ServiceWave'09 Proceedings of the 2009 international conference on Service-oriented computing
Using ontologies for extracting product features from web pages
ISWC'06 Proceedings of the 5th international conference on The Semantic Web
Query induction with schema-guided pruning strategies
The Journal of Machine Learning Research
Towards generic framework for tabular data extraction and management in documents
Proceedings of the sixth workshop on Ph.D. students in information and knowledge management
Hi-index | 0.00 |
The Lixto project is an ongoing research effort in the area of Web data extraction. Whereas the project originally started out with the idea to develop a logic-based extraction language and a tool to visually define extraction programs from sample Web pages, the scope of the project has been extended over time. Today, new issues such as employing learning algorithms for the definition of extraction programs, automatically extracting data from Web pages featuring a table-centric visual appearance, and extracting from alternative document formats such as PDF are being investigated.