Organizing and searching the world wide web of facts -- step two: harnessing the wisdom of the crowds

Authors:
Marius Paşca
Affiliations:
Google Inc., Mountain View, CA
Venue:
Proceedings of the 16th international conference on World Wide Web
Year:
2007

Citing 19
Cited 44

Probabilistic query expansion using query logs

Proceedings of the 11th international conference on World Wide Web
On the MSE robustness of batching estimators

Proceedings of the 33nd conference on Winter simulation
Automatic retrieval and clustering of similar words

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 2
Measures of distributional similarity

ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Learning question classifiers

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Evaluating answers to definition questions

NAACL-Short '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology: companion volume of the Proceedings of HLT-NAACL 2003--short papers - Volume 2
Mining knowledge from text using information extraction

ACM SIGKDD Explorations Newsletter - Natural language processing and text mining
Evaluating WordNet-based Measures of Lexical Semantic Relatedness

Computational Linguistics
Re-ranking search results using query logs

CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Espresso: leveraging generic patterns for automatically harvesting semantic relations

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Exploring distributional similarity based models for query spelling correction

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
KnowItNow: fast, scalable information extraction from the web

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Preemptive information extraction using unrestricted relation discovery

HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
Organizing and searching the world wide web of facts - step one: the one-million fact extraction challenge

AAAI'06 proceedings of the 21st national conference on Artificial intelligence - Volume 2
WikiRelate! computing semantic relatedness using wikipedia

AAAI'06 proceedings of the 21st national conference on Artificial intelligence - Volume 2
Turing's dream and the knowledge challenge

AAAI'06 proceedings of the 21st national conference on Artificial intelligence - Volume 2
An analysis of knowledge collected from volunteer contributors

AAAI'05 Proceedings of the 20th national conference on Artificial intelligence - Volume 2
What you seek is what you get: extraction of class attributes from query logs

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Automatic discovery of attribute words from web documents

IJCNLP'05 Proceedings of the Second international joint conference on Natural Language Processing

Weakly-supervised discovery of named entities using web search queries

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Tapping on the potential of q&a community by recommending answer providers

Proceedings of the 17th ACM conference on Information and knowledge management
Using structured text for large-scale attribute extraction

Proceedings of the 17th ACM conference on Information and knowledge management
Bootstrapped extraction of class attributes

Proceedings of the 18th international conference on World wide web
Low-Cost Supervision for Multiple-Source Attribute Extraction

CICLing '09 Proceedings of the 10th International Conference on Computational Linguistics and Intelligent Text Processing
Named entity mining from click-through data using weakly supervised latent dirichlet allocation

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
AIDE: ad-hoc intents detection engine over query logs

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Named entity recognition in query

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Web-derived resources for web information retrieval: from conceptual hierarchies to attribute hierarchies

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Outclassing Wikipedia in open-domain information extraction: weakly-supervised acquisition of attributes over conceptual hierarchies

EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics
Decoding wikipedia categories for knowledge acquisition

AAAI'08 Proceedings of the 23rd national conference on Artificial intelligence - Volume 2
Turning web text and search queries into factual knowledge: hierarchical class attribute extraction

AAAI'08 Proceedings of the 23rd national conference on Artificial intelligence - Volume 2
Finding cars, goddesses and enzymes: parametrizable acquisition of labeled instances for open-domain information extraction

AAAI'08 Proceedings of the 23rd national conference on Artificial intelligence - Volume 2
Semi-automatic entity set refinement

NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Helping editors choose better seed sets for entity set expansion

Proceedings of the 18th ACM conference on Information and knowledge management
Automatic set instance extraction using the web

ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1
Web-scale distributional similarity and entity set expansion

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2
Towards rich query interpretation: walking back and forth for mining query templates

Proceedings of the 19th international conference on World wide web
A scalable machine-learning approach for semi-structured named entity recognition

Proceedings of the 19th international conference on World wide web
Automatic extraction of clickable structured web contents for name entity queries

Proceedings of the 19th international conference on World wide web
Building taxonomy of web search intents for name entity queries

Proceedings of the 19th international conference on World wide web
Automatic domain-ontology structure and example acquisition from semi-structured texts

FSKD'09 Proceedings of the 6th international conference on Fuzzy systems and knowledge discovery - Volume 7
Acquisition of instance attributes via labeled and related instances

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Think globally, apply locally: using distributional characteristics for Hindi named entity identification

NEWS '10 Proceedings of the 2010 Named Entities Workshop
Constraints based taxonomic relation classification

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Open entity extraction from web search query logs

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Instance sense induction from attribute sets

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
SCAD: collective discovery of attribute values

Proceedings of the 20th international conference on World wide web
Self-adjusting bootstrapping

CICLing'11 Proceedings of the 12th international conference on Computational linguistics and intelligent text processing - Volume Part II
Extraction and geographical navigation of important historical events in the web

W2GIS'11 Proceedings of the 10th international conference on Web and wireless geographical information systems
Taxonomy induction based on a collaboratively built knowledge repository

Artificial Intelligence
Evaluating significance of historical entities based on tempo-spatial impacts analysis using Wikipedia link structure

Proceedings of the 22nd ACM conference on Hypertext and hypermedia
Learning search tasks in queries and web pages via graph regularization

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Gauging the internet doctor: ranking medical claims based on community knowledge

Proceedings of the 2011 workshop on Data mining for medicine and healthcare
The role of query sessions in extracting instance attributes from web search queries

ECIR'2010 Proceedings of the 32nd European conference on Advances in Information Retrieval
Class label enhancement via related instances

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Probase: a probabilistic taxonomy for text understanding

SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
Role-explicit query identification and intent role annotation

Proceedings of the 21st ACM international conference on Information and knowledge management
Transforming Wikipedia into a large scale multilingual concept network

Artificial Intelligence
Finding additional semantic entity information for search engines

Proceedings of the Seventeenth Australasian Document Computing Symposium
Understanding tables on the web

ER'12 Proceedings of the 31st international conference on Conceptual Modeling
Wikipedia entity expansion and attribute extraction from the web using semi-supervised learning

Proceedings of the sixth ACM international conference on Web search and data mining
Extracting query facets from search results

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Mining search and browse logs for web search: A Survey

ACM Transactions on Intelligent Systems and Technology (TIST) - Survey papers, special sections on the semantic adaptive social web, intelligent systems for health informatics, regular papers

Quantified Score

Hi-index	0.00

Visualization

Abstract

As part of a large effort to acquire large repositories of facts from unstructured text on the Web, a seed-based framework for textual information extraction allows for weakly supervised extraction of class attributes (e.g., side effects and generic equivalent for drugs) from anonymized query logs. The extraction is guided by a small set of seed attributes, without any need for handcrafted extraction patterns or further domain-specific knowledge. The attributes of classes pertaining to various domains of interest to Web search users have accuracy levels significantly exceeding current state of the art. Inherently noisy search queries are shown to be a highly valuable, albeit unexplored, resource for Web-based information extraction, in particular for the task of class attribute extraction.