Concordance-Based Entity-Oriented Search

Authors:
Mikhail Bautin;Steven Skiena
Affiliations:
-;-
Venue:
WI '07 Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence
Year:
2007

Citing 0
Cited 8

Expanding network communities from representative examples

ACM Transactions on Knowledge Discovery from Data (TKDD)
Exploiting web search engines to search structured databases

Proceedings of the 18th international conference on World wide web
Name-ethnicity classification from open sources

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Boosting a Semantic Search Engine by Named Entities

ISMIS '09 Proceedings of the 18th International Symposium on Foundations of Intelligent Systems
Concordance-based entity-oriented search

Web Intelligence and Agent Systems
Keyword++: a framework to improve keyword search over entity databases

Proceedings of the VLDB Endowment
A probability model for related entity retrieval using relation pattern

KSEM'11 Proceedings of the 5th international conference on Knowledge Science, Engineering and Management
A Relation Pattern-Driven Probability Model for Related Entity Retrieval

International Journal of Knowledge and Systems Science

Quantified Score

Hi-index	0.00

Visualization

Abstract

We consider the problem of finding the relevant named entities in response to a search query over a given text corpus. Entity search can readily be used to augment conventional web search engines for a variety of applications. To assess the significance of entity search, we analyzed the AOL dataset of 36 million web search queries with respect to two different sets of entities: namely (a) 2.3 million distinct entities extracted from a news text corpus and (b) 2.9 million Wikipedia article titles. The results clearly indicate that search engines should be aware of entities, for under various criteria of matching between 18-39% of all web search queries can be recognized as specifically searching for entities, while 73-87% of all queries contain entities. Our entity search engine creates a concordance document for each entity, consisting of all the sentences in the corpus containing that entity. We then index and search these documents using open-source search software. This gives a ranked list of entities as the result of search. Visit http://www.textmap.com for a demonstration of our entity search engine over a large news corpus. We evaluate our system by comparing the results of each query to the list of entities that have highest statistical juxtaposition scores with the queried entity. Juxtaposition score is a measure of how strongly two entities are related in terms of a probabilistic upper bound. The results show excellent performance, particularly over well-characterized classes of entities such as people.