A hierarchical approach to wrapper induction
Proceedings of the third annual conference on Autonomous Agents
ACM SIGKDD Explorations Newsletter
IEPAD: information extraction based on pattern discovery
Proceedings of the 10th international conference on World Wide Web
DEByE - Date extraction by example
Data & Knowledge Engineering
Visual Web Information Extraction with Lixto
Proceedings of the 27th International Conference on Very Large Data Bases
RoadRunner: Towards Automatic Data Extraction from Large Web Sites
Proceedings of the 27th International Conference on Very Large Data Bases
A Prototype for Metadata-Based Integration of Internet Sources
CAiSE '99 Proceedings of the 11th International Conference on Advanced Information Systems Engineering
The Wargo System: Semi-Automatic Wrapper Generation in Presence of Complex Data Access Modes
DEXA '02 Proceedings of the 13th International Workshop on Database and Expert Systems Applications
Web Warehousing: Design and Issues
ER '98 Proceedings of the Workshops on Data Warehousing and Data Mining: Advances in Database Technologies
Robust and efficient fuzzy match for online data cleaning
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Editorial: special issue on web content mining
ACM SIGKDD Explorations Newsletter
Fully automatic wrapper generation for search engines
WWW '05 Proceedings of the 14th international conference on World Wide Web
Extracting context to improve accuracy for HTML content extraction
WWW '05 Special interest tracks and posters of the 14th international conference on World Wide Web
Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data (Data-Centric Systems and Applications)
A high performance integrated web data warehousing
Cluster Computing
Extracting Web Data Using Instance-Based Learning
World Wide Web
NET – a system for extracting web data from flat and nested data records
WISE'05 Proceedings of the 6th international conference on Web Information Systems Engineering
Cleaning web pages for effective web content mining
DEXA'06 Proceedings of the 17th international conference on Database and Expert Systems Applications
International Journal of Data Warehousing and Mining
An Empirical Evaluation of Similarity Coefficients for Binary Valued Data
International Journal of Data Warehousing and Mining
User Behaviour Pattern Mining from Weblog
International Journal of Data Warehousing and Mining
Hi-index | 0.00 |
The process of extracting comparative heterogeneous web content data which are derived and historical from related web pages is still at its infancy and not developed. Discovering potentially useful and previously unknown information or knowledge from web contents such as "list all articles on 'Sequential Pattern Mining' written between 2007 and 2011 including title, authors, volume, abstract, paper, citation, year of publication," would require finding the schema of web documents from different web pages, performing web content data integration, building their virtual or physical data warehouse before web content extraction and mining from the database. This paper proposes a technique for automatic web content data extraction, the WebOMiner system, which models web sites of a specific domain like Business to Customer B2C web sites, as object oriented database schemas. Then, non-deterministic finite state automata NFA based wrappers for recognizing content types from this domain are built and used for extraction of related contents from data blocks into an integrated database for future second level mining for deep knowledge discovery.