A new approach to intranet search based on information extraction

  • Authors:
  • Hang Li;Yunbo Cao;Jun Xu;Yunhua Hu;Shenjie Li;Dmitriy Meyerzon

  • Affiliations:
  • Microsoft Research Asia, Haidian, Beijing, China;Microsoft Research Asia, Haidian, Beijing, China;Nankai University, Tianjin, China;Xi'an Jiaotong University, Xi'an, China;Hong Kong University of Science and Technology, Hong Kong, China;Microsoft Corporation, Redmond, WA

  • Venue:
  • Proceedings of the 14th ACM international conference on Information and knowledge management
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper is concerned with 'intranet search'. By intranet search, we mean searching for information on an intranet within an organization. We have found that search needs on an intranet can be categorized into types, through an analysis of survey results and an analysis of search log data. The types include searching for definitions, persons, experts, and homepages. Traditional information retrieval only focuses on search of relevant documents, but not on search of special types of information. We propose a new approach to intranet search in which we search for information in each of the special types, in addition to the traditional relevance search. Information extraction technologies can play key roles in such kind of 'search by type' approach, because we must first extract from the documents the necessary information in each type. We have developed an intranet search system called 'Information Desk'. In the system, we try to address the most important types of search first - finding term definitions, homepages of groups or topics, employees' personal information and experts on topics. For each type of search, we use information extraction technologies to extract, fuse, and summarize information in advance. The system is in operation on the intranet of Microsoft and receives accesses from about 500 employees per month. Feedbacks from users and system logs show that users consider the approach useful and the system can really help people to find information. This paper describes the architecture, features, component technologies, and evaluation results of the system.