On the use of spreading activation methods in automatic information
SIGIR '88 Proceedings of the 11th annual international ACM SIGIR conference on Research and development in information retrieval
Modern Information Retrieval
RoadRunner: Towards Automatic Data Extraction from Large Web Sites
Proceedings of the 27th International Conference on Very Large Data Bases
Semi-Automatic Wrapper Generation for Internet Information Sources
COOPIS '97 Proceedings of the Second IFCIS International Conference on Cooperative Information Systems
SemTag and seeker: bootstrapping the semantic web via automated semantic annotation
WWW '03 Proceedings of the 12th international conference on World Wide Web
Extracting structured data from Web pages
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Wrapper induction for information extraction
Wrapper induction for information extraction
Web-scale information extraction in knowitall: (preliminary results)
Proceedings of the 13th international conference on World Wide Web
Automated Metadata and Instance Extraction from News Web Sites
WI '05 Proceedings of the 2005 IEEE/WIC/ACM International Conference on Web Intelligence
OntoMiner: Bootstrapping and Populating Ontologies from Domain-Specific Web Sites
IEEE Intelligent Systems
CP/CV: concept similarity mining without frequency information from domain describing taxonomies
CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Propagation-vectors for trees (PVT): concise yet effective summaries for hierarchical data and trees
Proceedings of the 2008 ACM workshop on Large-Scale distributed systems for information retrieval
Semantic partitioning of web pages
WISE'05 Proceedings of the 6th international conference on Web Information Systems Engineering
Proceedings of the 2nd ACM Conference on Bioinformatics, Computational Biology and Biomedicine
Hi-index | 0.00 |
The Web has established itself as the largest public data repository ever available. Even though the vast majority of information on the Web is formatted to be easily readable by the human eye, “meaningful information” is still largely inaccessible for the computer applications. In this paper, we present automated algorithms to gather meta-data and instance information by utilizing global regularities on the Web and incorporating the contextual information. Our system is distinguished since it does not require domain specific engineering. Experimental evaluations were successfully performed on the TAP knowledge base and the faculty-course home pages of computer science departments containing 16,861 Web pages.