Identifying syntactic differences between two programs
Software—Practice & Experience
A hierarchical approach to wrapper induction
Proceedings of the third annual conference on Autonomous Agents
Wrapper induction: efficiency and expressiveness
Artificial Intelligence - Special issue on Intelligent internet systems
IEPAD: information extraction based on pattern discovery
Proceedings of the 10th international conference on World Wide Web
RoadRunner: Towards Automatic Data Extraction from Large Web Sites
Proceedings of the 27th International Conference on Very Large Data Bases
Data extraction and label assignment for web databases
WWW '03 Proceedings of the 12th international conference on World Wide Web
Table extraction using conditional random fields
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Extracting structured data from Web pages
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Mining data records in Web pages
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Automatic web news extraction using tree edit distance
Proceedings of the 13th international conference on World Wide Web
Using the structure of Web sites for automatic segmentation of tables
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Fully automatic wrapper generation for search engines
WWW '05 Proceedings of the 14th international conference on World Wide Web
Web data extraction based on partial tree alignment
WWW '05 Proceedings of the 14th international conference on World Wide Web
A Survey of Web Information Extraction Systems
IEEE Transactions on Knowledge and Data Engineering
Extraction of flat and nested data records from web pages
AusDM '06 Proceedings of the fifth Australasian conference on Data mining and analystics - Volume 61
Mining templates from search result records of search engines
Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
ADMA '08 Proceedings of the 4th international conference on Advanced Data Mining and Applications
WRAPPER INFERENCE FOR AMBIGUOUS WEB PAGES
Applied Artificial Intelligence
Extracting data records from the web using tag path clustering
Proceedings of the 18th international conference on World wide web
Juicer: Scalable Extraction for Thread Meta-information of Web Forum
PAISI '09 Proceedings of the Pacific Asia Workshop on Intelligence and Security Informatics
Information extraction for search engines using fast heuristic techniques
Data & Knowledge Engineering
Extracting Structured Data from Web Pages with Maximum Entropy Segmental Markov Model
WISE '09 Proceedings of the 10th International Conference on Web Information Systems Engineering
Web news extraction based on path pattern mining
FSKD'09 Proceedings of the 6th international conference on Fuzzy systems and knowledge discovery - Volume 7
Automatic extraction of web data records containing user-generated content
CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
A novel method for bilingual web page acquisition from search engine web records
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Online social network profile data extraction for vulnerability analysis
International Journal of Internet Technology and Secured Transactions
An indent shape based approach for web lists mining
WISM'11 Proceedings of the 2011 international conference on Web information systems and mining - Volume Part II
Extracting data records from query result pages based on visual features
BNCOD'11 Proceedings of the 28th British national conference on Advances in databases
Hybrid method for automated news content extraction from the web
WISE'06 Proceedings of the 7th international conference on Web Information Systems
Answering math queries with search engines
Proceedings of the 21st international conference companion on World Wide Web
TEX: An efficient and effective unsupervised Web information extractor
Knowledge-Based Systems
Towards Comparative Mining of Web Document Objects with NFA: WebOMiner System
International Journal of Data Warehousing and Mining
Web news extraction via path ratios
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Robust detection of semi-structured web records using a DOM structure-knowledge-driven model
ACM Transactions on the Web (TWEB)
Semantic extraction of geographic data from web tables for big data integration
Proceedings of the 7th Workshop on Geographic Information Retrieval
Scalable and noise tolerant web knowledge extraction for search task simplification
Decision Support Systems
Hi-index | 0.00 |
This paper studies automatic extraction of structured data from Web pages. Each of such pages may contain several groups of structured data records. Existing automatic methods still have several limitations. In this paper, we propose a more effective method for the task. Given a page, our method first builds a tag tree based on visual information. It then performs a post-order traversal of the tree and matches subtrees in the process using a tree edit distance method and visual cues. After the process ends, data records are found and data items in them are aligned and extracted. The method can extract data from both flat and nested data records. Experimental evaluation shows that the method performs the extraction task accurately.