Mining data records in Web pages
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Clustering web pages based on their structure
Data & Knowledge Engineering - Special issue: WIDM 2003
Mapping web pages to database records via link paths
CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
WINACS: construction and analysis of web-based computer science information networks
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Construction and analysis of web-based computer science information networks
RSFDGrC'11 Proceedings of the 13th international conference on Rough sets, fuzzy sets, data mining and granular computing
Building enriched web page representations using link paths
Proceedings of the 23rd ACM conference on Hypertext and social media
Research-insight: providing insight on research by publication network analysis
Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
The parallel path framework for entity discovery on the web
ACM Transactions on the Web (TWEB)
Hi-index | 0.00 |
In this paper, we use the structural and relational information on the Web to find entity-pages. Specifically, given a Web site and an entity-page (e.g., department and faculty member homepage) we seek to find all of the entity-pages of the same type (e.g., all faculty members in the department). To do this, we propose a web structure mining method which grows parallel paths through the web graph and DOM trees. We show that by utilizing these parallel paths we can efficiently discover all entity-pages of the same type. Finally, we demonstrate the accuracy of our method with a case study on various domains.