Dynamic reference sifting: a case study in the homepage domain
Selected papers from the sixth international conference on World Wide Web
Managing Gigabytes: Compressing and Indexing Documents and Images
Managing Gigabytes: Compressing and Indexing Documents and Images
Mining the Web: Discovering Knowledge from HyperText Data
Mining the Web: Discovering Knowledge from HyperText Data
Information Extraction with HMM Structures Learned by Stochastic Optimization
Proceedings of the Seventeenth National Conference on Artificial Intelligence and Twelfth Conference on Innovative Applications of Artificial Intelligence
Machine learning for information extraction in informal domains
Machine learning for information extraction in informal domains
Focused crawling for both topical relevance and quality of medical information
Proceedings of the 14th ACM international conference on Information and knowledge management
Link Contexts in Classifier-Guided Topical Crawlers
IEEE Transactions on Knowledge and Data Engineering
Architecture of a grid-enabled Web search engine
Information Processing and Management: an International Journal
Extracting content structure for web pages based on visual representation
APWeb'03 Proceedings of the 5th Asia-Pacific web conference on Web technologies and applications
NLP-based faceted search: Experience in the development of a science and technology search engine
Expert Systems with Applications: An International Journal
Hi-index | 0.00 |
We describe the architecture of an automatic domain-specific Web portal construction system. The system has three major components: i) a focused crawler that collects the domain-specific pages on the Web, ii) an information extraction engine that extracts useful fields from these Web pages, and iii) a query engine that allows both typical keyword based queries on the pages and advanced queries on the extracted data fields. We present a prototype system that works for the course homepages domain on the Web. A user study with the prototype system shows that our approach produces high quality results and achieves better precision figures than the typical keyword based search.