A new approach to intranet search based on information extraction

Authors:
Hang Li;Yunbo Cao;Jun Xu;Yunhua Hu;Shenjie Li;Dmitriy Meyerzon
Affiliations:
Microsoft Research Asia, Haidian, Beijing, China;Microsoft Research Asia, Haidian, Beijing, China;Nankai University, Tianjin, China;Xi'an Jiaotong University, Xi'an, China;Hong Kong University of Science and Technology, Hong Kong, China;Microsoft Corporation, Redmond, WA
Venue:
Proceedings of the 14th ACM international conference on Information and knowledge management
Year:
2005

Citing 16
Cited 5

The nature of statistical learning theory

The nature of statistical learning theory
Scaling question answering to the Web

Proceedings of the 10th international conference on World Wide Web
Effective site finding using link anchor information

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Probabilistic question answering on the web

Proceedings of the 11th international conference on World Wide Web
The Perceptron Algorithm with Uneven Margins

ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Enterprise expert and knowledge discovery

Proceedings of the HCI International '99 (the 8th International Conference on Human-Computer Interaction) on Human-Computer Interaction: Communication, Cooperation, and Application Design-Volume 2 - Volume 2
Mining topic-specific concepts and definitions on the web

WWW '03 Proceedings of the 12th international conference on World Wide Web
Searching the workplace web

WWW '03 Proceedings of the 12th international conference on World Wide Web
Automatic document metadata extraction using support vector machines

Proceedings of the 3rd ACM/IEEE-CS joint conference on Digital libraries
Understanding user goals in web search

Proceedings of the 13th international conference on World Wide Web
Unsupervised learning of soft patterns for generating definitions from online news

Proceedings of the 13th international conference on World Wide Web
Challenges in enterprise search

ADC '04 Proceedings of the 15th Australasian database conference - Volume 27
Ranking definitions with supervised learning methods

WWW '05 Special interest tracks and posters of the 14th international conference on World Wide Web
Automatic extraction of titles from general documents using machine learning

Proceedings of the 5th ACM/IEEE-CS joint conference on Digital libraries
An analysis of the AskMSR question-answering system

EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
Cha-Cha: a system for organizing intranet search results

USITS'99 Proceedings of the 2nd conference on USENIX Symposium on Internet Technologies and Systems - Volume 2

Leveraging semantic technologies for enterprise search

Proceedings of the ACM first Ph.D. workshop in CIKM
Identifying clusters of user behavior in intranet search engine log files

Journal of the American Society for Information Science and Technology
Automatically generating high quality metadata by analyzing the document code of common file types

Proceedings of the 9th ACM/IEEE-CS joint conference on Digital libraries
Towards the development of an integrated framework for enhancing enterprise search using latent semantic indexing

ICCS'11 Proceedings of the 19th international conference on Conceptual structures for discovering knowledge
Combining the Best of Two Worlds: NLP and IR for Intranet Search

WI-IAT '11 Proceedings of the 2011 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology - Volume 01

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper is concerned with 'intranet search'. By intranet search, we mean searching for information on an intranet within an organization. We have found that search needs on an intranet can be categorized into types, through an analysis of survey results and an analysis of search log data. The types include searching for definitions, persons, experts, and homepages. Traditional information retrieval only focuses on search of relevant documents, but not on search of special types of information. We propose a new approach to intranet search in which we search for information in each of the special types, in addition to the traditional relevance search. Information extraction technologies can play key roles in such kind of 'search by type' approach, because we must first extract from the documents the necessary information in each type. We have developed an intranet search system called 'Information Desk'. In the system, we try to address the most important types of search first - finding term definitions, homepages of groups or topics, employees' personal information and experts on topics. For each type of search, we use information extraction technologies to extract, fuse, and summarize information in advance. The system is in operation on the intranet of Microsoft and receives accesses from about 500 employees per month. Feedbacks from users and system logs show that users consider the approach useful and the system can really help people to find information. This paper describes the architecture, features, component technologies, and evaluation results of the system.