Searching for information in a hypertext medical handbook
Communications of the ACM
Automatic text processing: the transformation, analysis, and retrieval of information by computer
Automatic text processing: the transformation, analysis, and retrieval of information by computer
Hypermedia and free text retrieval
Information Processing and Management: an International Journal - Special issue on hypertext and information retrieval
Retrieval strategies for hypertext
Information Processing and Management: an International Journal - Special issue on hypertext and information retrieval
Making use of hypertext links when retrieving information
ECHT '92 Proceedings of the ACM conference on Hypertext
HyPursuit: a hierarchical network search engine that exploits content-link hypertext clustering
Proceedings of the the seventh ACM conference on Hypertext
Information Retrieval and HyperText
Information Retrieval and HyperText
Introduction to Modern Information Retrieval
Introduction to Modern Information Retrieval
WISE: A World Wide Web Resource Database System
IEEE Transactions on Knowledge and Data Engineering
Search and Ranking Algorithms for Locating Resources on the World Wide Web
ICDE '96 Proceedings of the Twelfth International Conference on Data Engineering
Building efficient and effective metasearch engines
ACM Computing Surveys (CSUR)
A Graphical User Interface for Structured Document Retrieval
Proceedings of the 24th BCS-IRSG European Colloquium on IR Research: Advances in Information Retrieval
The Index-Based XXL Search Engine for Querying XML Data with Relevance Ranking
EDBT '02 Proceedings of the 8th International Conference on Extending Database Technology: Advances in Database Technology
A New Study on Using HTML Structures to Improve Retrieval
ICTAI '99 Proceedings of the 11th IEEE International Conference on Tools with Artificial Intelligence
An indexing model of HTML documents
Proceedings of the 2003 ACM symposium on Applied computing
A graphical user interface for the retrieval of hierarchically structured documents
Information Processing and Management: an International Journal
FleXPath: flexible structure and full-text querying for XML
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Improving search results with data mining in a thematic search engine
Computers and Operations Research
Language identification in web pages
Proceedings of the 2005 ACM symposium on Applied computing
An algorithm to cluster documents based on relevance
Information Processing and Management: an International Journal
Title extraction from bodies of HTML documents and its application to web page retrieval
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Web page title extraction and its application
Information Processing and Management: an International Journal
Formal Verification of Websites
Electronic Notes in Theoretical Computer Science (ENTCS)
Managing knowledge on the Web - Extracting ontology from HTML Web
Decision Support Systems
An algorithm to cluster documents based on relevance
Information Processing and Management: an International Journal
The adaptive web
A domain-based intelligent search engine
ICIC'06 Proceedings of the 2006 international conference on Intelligent computing: Part II
Extracting search-focused key n-grams for relevance ranking in web search
Proceedings of the fifth ACM international conference on Web search and data mining
Hi-index | 0.00 |
The World Wide Web (WWW) is a gigantic information resource, which is growing daily. As more and more data are added to the WWW, it is becoming increasingly difficult to effectively locate useful information from this environment. In this paper, we propose a method for making use of the structures and hyperlinks of HTML documents to improve the effectiveness of retrieving HTML documents. Our study assigns the occurrences of terms in a document collection into six classes according to the tags in which a particular term appears (such as Title, H1-H6, and Anchor). Based on the assignment, we extend the weighting schemes in traditional information retrieval by incorporating different importance factors to terms in different classes. The rationale is that terms appearing in different places of a document may have different significance in identifying the document. For this research we have built a Web based search tool, Webor, created a testbed, and conducted extensive experiments to determine an optimal class importance factor combination. Our study indicates that substantial improvement of retrieval effectiveness can be achieved using this technique.