Metadata Based Web Mining for Topic-Specific Information Gathering
EC-WEB '00 Proceedings of the First International Conference on Electronic Commerce and Web Technologies
Automatic extraction of titles from general documents using machine learning
Proceedings of the 5th ACM/IEEE-CS joint conference on Digital libraries
Automatic extraction of titles from general documents using machine learning
Information Processing and Management: an International Journal
Hi-index | 0.00 |
This paper presents a relevant term discoverer, a system that discovers relevant topics of a given topic from the World Wide Web. The system mines hyperlink metadata on the basis of the association of terms in the metadata. It also applies various filtering techniques to detect false positives and false negatives. The applications of the system include: i) topic-specific information gathering systems that need to crawl resources of the relevant topic, ii) bibliography search system that need to extend their search to the articles of relevant topics, iii) classification systems that can categorize items of similar class together, and so on. We report a successful application of the system to build a topic-specific search-engine dedicated to eXtensible Markup Language (XML). Using the algorithms presented in this paper, we were able to identify the relevant topics that the search engine needs to cover. Together with effective topic-directed crawling algorithms, we were able to build a topic-specific search engine that require significantly less human labor but perform almost as well as topic-specific search engines whose content is maintained by humans.