Personal name classification in web queries

Authors:
Dou Shen;Toby Walkery;Zijian Zhengy;Qiang Yangz;Ying Li
Affiliations:
Microsoft Corporation, Redmond, WA;Microsoft Corporation, Redmond, WA;Microsoft Corporation, Redmond, WA;Hong Kong University of Science and Technology;Microsoft Corporation, Redmond, WA
Venue:
WSDM '08 Proceedings of the 2008 International Conference on Web Search and Data Mining
Year:
2008

Citing 22
Cited 5

The nature of statistical learning theory

The nature of statistical learning theory
Foundations of statistical natural language processing

Foundations of statistical natural language processing
Analysis of a very large web search engine query log

ACM SIGIR Forum
Agglomerative clustering of a search engine query log

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
An Evaluation of Statistical Approaches to Text Categorization

Information Retrieval
Clustering user queries of a search engine

Proceedings of the 10th international conference on World Wide Web
Information Retrieval

Information Retrieval
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features

ECML '98 Proceedings of the 10th European Conference on Machine Learning
A New Statistical Approach to Personal Name Extraction

ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Augmenting Naive Bayes Classifiers with Statistical Language Models

Information Retrieval
Introduction to Machine Learning (Adaptive Computation and Machine Learning)

Introduction to Machine Learning (Adaptive Computation and Machine Learning)
Automatic identification of user goals in Web search

WWW '05 Proceedings of the 14th international conference on World Wide Web
Assigning belief scores to names in queries

HLT '01 Proceedings of the first international conference on Human language technology research
A testbed for people searching strategies in the WWW

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Person resolution in person search results: WebHawk

Proceedings of the 14th ACM international conference on Information and knowledge management
An Adaptive Two-Phase Approach to WiFi Location Sensing

PERCOMW '06 Proceedings of the 4th annual IEEE international conference on Pervasive Computing and Communications Workshops
Named entity recognition with a maximum entropy approach

CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
Named entity recognition through classifier combination

CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
Finding advertising keywords on web pages

Proceedings of the 15th international conference on World Wide Web
Detecting online commercial intention (OCI)

Proceedings of the 15th international conference on World Wide Web
Query enrichment for web-query classification

ACM Transactions on Information Systems (TOIS)
An effective two-stage model for exploiting non-local dependencies in named entity recognition

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics

Query suggestion using hitting time

Proceedings of the 17th ACM conference on Information and knowledge management
Understanding user's query intent with wikipedia

Proceedings of the 18th international conference on World wide web
Improving web search relevance with semantic features

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2
Structural annotation of search queries using pseudo-relevance feedback

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Joint annotation of search queries

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1

Quantified Score

Hi-index	0.00

Visualization

Abstract

Personal names are an important kind of Web queries in Web search, and yet they are special in many ways. Strategies for retrieving information on personal names should therefore be different from the strategies for other types of queries. To improve the search quality for personal names, a first step is to detect whether a query is a personal name. Despite the importance of this problem, relatively little previous research has been done on this topic. Since Web queries are usually short, conventional supervised machine-learning algorithms cannot be applied directly. An alternative is to apply some heuristic rules coupled with name-term dictionaries. However, when the dictionaries are small, this method tends to make false negatives; when the dictionaries are large, it tends to generate false positives. A more serious problem is that this method cannot provide a good trade-off between precision and recall. To solve these problems, we propose an approach based on the construction of probabilistic name-term dictionaries and personal name grammars, and use this algorithm to predict the probability of a query to be a personal name. In this paper, we develop four different methods for building probabilistic name-term dictionaries in which a term is assigned with a probability value of the term being a name term. We compared our approach with baseline algorithms such as dictionary-based look-up methods and supervised classification algorithms including logistic regression and SVM on some manually labeled test sets. The results validate the effectiveness of our approach, whose F1 value is more than 79.8%, which outperforms the best baseline by more than 11.3%