Keyword spices: a new method for building domain-specific web search engines

Authors:
Satoshi Oyama;Takashi Kokubo;Toru Ishida;Teruhiro Yamada;Yasuhiko Kitamura
Affiliations:
Department of Social Informatics, Kyoto University, Kyoto, Japan;NTT Docomo, Inc. and Department of Social Informatics, Kyoto University, Kyoto, Japan;Department of Social Informatics, Kyoto University, Kyoto, Japan;SANYO Electric Co.,Ltd. and Laboratories of Image Information Science and Technology;Department of Information and Communication Engineering, Osaka City University, Osaka, Japan
Venue:
IJCAI'01 Proceedings of the 17th international joint conference on Artificial intelligence - Volume 2
Year:
2001

Citing 10
Cited 11

C4.5: programs for machine learning

C4.5: programs for machine learning
Performance standards and evaluations in IR test collections: cluster-based retrieval models

Information Processing and Management: an International Journal
A Web-based information system that reasons with structured collections of text

AGENTS '98 Proceedings of the second international conference on Autonomous agents
Dynamic reference sifting: a case study in the homepage domain

Selected papers from the sixth international conference on World Wide Web
Learning to extract symbolic knowledge from the World Wide Web

AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
Machine Learning

Machine Learning
Modern Information Retrieval

Modern Information Retrieval
Induction of Decision Trees

Machine Learning
A Machine Learning Approach to Building Domain-Specific Search Engines

IJCAI '99 Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence
Moving up the information food chain: deploying softbots on the world wide web

AAAI'96 Proceedings of the thirteenth national conference on Artificial intelligence - Volume 2

Multiple character-agents interface: an information integration platform where multiple agents and human user collaborate

Proceedings of the first international joint conference on Autonomous agents and multiagent systems: part 2
Interactive Integration of Information Agents on the Web

CIA '01 Proceedings of the 5th International Workshop on Cooperative Information Agents V
Domain-Specific Web Search with Keyword Spices

IEEE Transactions on Knowledge and Data Engineering
Suggesting novel but related topics: towards context-based support for knowledge model extension

Proceedings of the 10th international conference on Intelligent user interfaces
Query expansion with the minimum user feedback by transductive learning

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
SERGEANT: A framework for building more flexible web agents by exploiting a search engine

Web Intelligence and Agent Systems
Semisupervised Query Expansion with Minimal Feedback

IEEE Transactions on Knowledge and Data Engineering
Domain-specific disambiguation for typing with ambiguous keyboards

TextEntry '03 Proceedings of the 2003 EACL Workshop on Language Modeling for Text Entry Methods
Statistical approach to estimate the quality of web datasets

CIMMACS'05 Proceedings of the 4th WSEAS international conference on Computational intelligence, man-machine systems and cybernetics
Query expansion with the minimum relevance judgments

AIRS'05 Proceedings of the Second Asia conference on Asia Information Retrieval Technology
Discovery of environmental nodes in the web

IRFC'12 Proceedings of the 5th conference on Multidisciplinary Information Retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents a new method for building domain-specific web search engines. Previous methods eliminate irrelevant documents from the pages accessed using heuristics based on human knowledge about the domain in question. Accordingly, they are hard to build and can not be applied to other domains. The keyword spice method, in contrast, improves search performance by adding domain-specific keywords, called keyword spices, to the user's input query; the modified query is then forwarded to a general-purpose search engine. Keyword spices can be effectively discovered automatically from web documents allowing us to build high quality domain-specific search engines in various domains without requiring the collection of heuristic knowledge. We describe a machine learning algorithm, which is a type of decision-tree learning algorithm, that can extract keyword spices. To demonstrate the value of the proposed approach, we conduct experiments in the domain of cooking. The results confirm the excellent performance of our method in terms of both precision and recall.