Choosing document structure weights

Authors:
Andrew Trotman
Affiliations:
Department of Computer Science, University of Otago, Dunedin, New Zealand
Venue:
Information Processing and Management: an International Journal
Year:
2005

Citing 28
Cited 14

Optimization of control parameters for genetic algorithms

IEEE Transactions on Systems, Man and Cybernetics
Probabilistic and genetic algorithms in document retrieval

Communications of the ACM
Computer systems that learn: classification and prediction methods from statistics, neural nets, machine learning, and expert systems

Computer systems that learn: classification and prediction methods from statistics, neural nets, machine learning, and expert systems
Adaptation in natural and artificial systems

Adaptation in natural and artificial systems
Overview of the first TREC conference

SIGIR '93 Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval
Structured answers for a large structured document collection

SIGIR '93 Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval
Automatic combination of multiple ranked retrieval systems

SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Passage-level evidence in document retrieval

SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Effective retrieval of structured documents

SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
An evolutionary approach to combinatorial optimization problems

CSC '94 Proceedings of the 22nd annual ACM computer science conference on Scaling up : meeting the challenge of complexity in real-world computing applications: meeting the challenge of complexity in real-world computing applications
Machine learning for information retrieval: neural networks, symbolic learning, and genetic algorithms

Journal of the American Society for Information Science
Passage retrieval revisited

Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval
BUS: an effective indexing and retrieval scheme in structured documents

Proceedings of the third ACM conference on Digital libraries
Crossover improvement for the genetic algorithm in information retrieval

Information Processing and Management: an International Journal
Applying genetic algorithms to query optimization in document retrieval

Information Processing and Management: an International Journal
A vector space model for automatic indexing

Communications of the ACM
Effective ranking with arbitrary passages

Journal of the American Society for Information Science and Technology
Expressive retrieval from XML documents

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
XIRQL: a query language for information retrieval in XML documents

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Automatically combining ranking heuristics for HTML documents

Proceedings of the 3rd international workshop on Web information and data management
Querying and ranking XML documents

Journal of the American Society for Information Science and Technology - XML
Structured information retrieval in XML documents

Proceedings of the 2002 ACM symposium on Applied computing
Genetic Algorithms in Search, Optimization and Machine Learning

Genetic Algorithms in Search, Optimization and Machine Learning
On using genetic algorithms for multimodal relevance optimization in information retrieval

Journal of the American Society for Information Science and Technology
Searching and Browsing Collections of Structural Information

ADL '00 Proceedings of the IEEE Advances in Digital Libraries 2000
An analysis of the behavior of a class of genetic adaptive systems.

An analysis of the behavior of a class of genetic adaptive systems.
Searching structured documents

Information Processing and Management: an International Journal
Improving index structures for structured document retrieval

IRSG'99 Proceedings of the 21st Annual BCS-IRSG conference on Information Retrieval Research

Review article: A review of structured document retrieval (SDR) technology to improve information access performance in engineering document management

Computers in Industry
Developing natural language-based program analyses and tools to expedite software maintenance

Companion of the 30th international conference on Software engineering
Wiki-based rapid prototyping for teaching-material design in e-Learning grids

Computers & Education
A Comparison of Genetic Algorithms for Optimizing Linguistically Informed IR in Question Answering

AI*IA '07 Proceedings of the 10th Congress of the Italian Association for Artificial Intelligence on AI*IA 2007: Artificial Intelligence and Human-Oriented Computing
Local search: A guide for the information retrieval practitioner

Information Processing and Management: an International Journal
Integrating Structure in the Probabilistic Model for Information Retrieval

WI-IAT '08 Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Volume 01
A framework for BM25F-based XML retrieval

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Clinical information retrieval using document and PICO structure

HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Using the past to score the present: extending term weighting models through revision history analysis

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
A Web page classification system based on a genetic algorithm using tagged-terms as features

Expert Systems with Applications: An International Journal
A survey on XML focussed component retrieval

Large Scale Semantic Access to Content (Text, Image, Video, and Sound)
Field-weighted XML retrieval based on BM25

INEX'05 Proceedings of the 4th international conference on Initiative for the Evaluation of XML Retrieval
A web personalized service based on dual GAs

ICNC'05 Proceedings of the First international conference on Advances in Natural Computation - Volume Part III
Aggregating evidence from hospital departments to improve medical records search

ECIR'13 Proceedings of the 35th European conference on Advances in Information Retrieval

Quantified Score

Hi-index	0.01

Visualization

Abstract

Existing ranking schemes assume all term occurrences in a given document are of equal influence. Intuitively, terms occurring in some places should have a greater influence than those elsewhere. An occurrence in an abstract may be more important than an occurrence in the body text. Although this observation is not new, there remains the issue of finding good weights for each structure.Vector space, probability, and Okapi BM25 ranking are extended to include structure weighting. Weights are then selected for the TREC WSJ collection using a genetic algorithm. The learned weights are then tested on an evaluation set of queries. Structure weighted vector space inner product and structure weighted probabilistic retrieval show an about 5% improvement in mean average precision over their unstructured counterparts. Structure weighted BM25 shows nearly no improvement. Analysis suggests BM25 cannot be improved using structure weighting.