Automatic text processing: the transformation, analysis, and retrieval of information by computer
Automatic text processing: the transformation, analysis, and retrieval of information by computer
The anatomy of a large-scale hypertextual Web search engine
WWW7 Proceedings of the seventh international conference on World Wide Web 7
Genetic Algorithms in Search, Optimization and Machine Learning
Genetic Algorithms in Search, Optimization and Machine Learning
Introduction to Modern Information Retrieval
Introduction to Modern Information Retrieval
Using the structure of HTML documents to improve retrieval
USITS'97 Proceedings of the USENIX Symposium on Internet Technologies and Systems on USENIX Symposium on Internet Technologies and Systems
Automatically combining ranking heuristics for HTML documents
Proceedings of the 3rd international workshop on Web information and data management
Proceedings of the 24th BCS-IRSG European Colloquium on IR Research: Advances in Information Retrieval
Improving search results with data mining in a thematic search engine
Computers and Operations Research
ADMA '08 Proceedings of the 4th international conference on Advanced Data Mining and Applications
The adaptive web
Ontology-aided vs. keyword-based web searches: a comparative user study
Large Scale Semantic Access to Content (Text, Image, Video, and Sound)
Factors affecting web page similarity
ECIR'05 Proceedings of the 27th European conference on Advances in Information Retrieval Research
Measuring web page similarity based on textual and visual properties
ICAISC'12 Proceedings of the 11th international conference on Artificial Intelligence and Soft Computing - Volume Part II
GA on IR: Study the Effectiveness of the Developed Fitness Function on IR
International Journal of Artificial Life Research
Hi-index | 0.00 |
Locating useful information effectively from the World Wide Web (WWW) is of wide interest. This paper presents new results on a methodology of using the structures and hyperlinks of HTML documents to improve the effectiveness of retrieving HTML documents. This methodology partitions the occurrences of terms in a document collection into classes according to the tags in which a particular term appears (such as Title, H1-H6, and Anchor). The rationale is that terms appearing in different structures of a document may have different significance in identifying the document. The weighting schemes of traditional information retrieval were extended to include class importance values. We implemented a genetic algorithm to determine a "best so far" class importance factor combination. Our experiments indicate that using this technique the retrieval effectiveness can be improved by 39.6% or higher.