A New Study on Using HTML Structures to Improve Retrieval

Authors:
M. Cutler;H. Deng;S. S. Maniccam;W. Meng
Affiliations:
-;-;-;-
Venue:
ICTAI '99 Proceedings of the 11th IEEE International Conference on Tools with Artificial Intelligence
Year:
1999

Citing 5
Cited 9

Automatic text processing: the transformation, analysis, and retrieval of information by computer

Automatic text processing: the transformation, analysis, and retrieval of information by computer
The anatomy of a large-scale hypertextual Web search engine

WWW7 Proceedings of the seventh international conference on World Wide Web 7
Genetic Algorithms in Search, Optimization and Machine Learning

Genetic Algorithms in Search, Optimization and Machine Learning
Introduction to Modern Information Retrieval

Introduction to Modern Information Retrieval
Using the structure of HTML documents to improve retrieval

USITS'97 Proceedings of the USENIX Symposium on Internet Technologies and Systems on USENIX Symposium on Internet Technologies and Systems

Automatically combining ranking heuristics for HTML documents

Proceedings of the 3rd international workshop on Web information and data management
Combining Web Document Representations in a Bayesian Inference Network Model Using Link and Content-Based Evidence

Proceedings of the 24th BCS-IRSG European Colloquium on IR Research: Advances in Information Retrieval
Improving search results with data mining in a thematic search engine

Computers and Operations Research
Link-Contexts for Ranking

ADMA '08 Proceedings of the 4th international conference on Advanced Data Mining and Applications
Web document modeling

The adaptive web
Ontology-aided vs. keyword-based web searches: a comparative user study

Large Scale Semantic Access to Content (Text, Image, Video, and Sound)
Factors affecting web page similarity

ECIR'05 Proceedings of the 27th European conference on Advances in Information Retrieval Research
Measuring web page similarity based on textual and visual properties

ICAISC'12 Proceedings of the 11th international conference on Artificial Intelligence and Soft Computing - Volume Part II
GA on IR: Study the Effectiveness of the Developed Fitness Function on IR

International Journal of Artificial Life Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

Locating useful information effectively from the World Wide Web (WWW) is of wide interest. This paper presents new results on a methodology of using the structures and hyperlinks of HTML documents to improve the effectiveness of retrieving HTML documents. This methodology partitions the occurrences of terms in a document collection into classes according to the tags in which a particular term appears (such as Title, H1-H6, and Anchor). The rationale is that terms appearing in different structures of a document may have different significance in identifying the document. The weighting schemes of traditional information retrieval were extended to include class importance values. We implemented a genetic algorithm to determine a "best so far" class importance factor combination. Our experiments indicate that using this technique the retrieval effectiveness can be improved by 39.6% or higher.