Combining labeled and unlabeled data with co-training
COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
Snowball: extracting relations from large plain-text collections
DL '00 Proceedings of the fifth ACM conference on Digital libraries
Scaling question answering to the web
ACM Transactions on Information Systems (TOIS)
DIRT @SBT@discovery of inference rules from text
Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
On the MSE robustness of batching estimators
Proceedings of the 33nd conference on Winter simulation
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Improving Text Classification by Shrinkage in a Hierarchy of Classes
ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
SemTag and seeker: bootstrapping the semantic web via automated semantic annotation
WWW '03 Proceedings of the 12th international conference on World Wide Web
Gimme' the context: context-driven automatic semantic annotation with C-PANKOW
WWW '05 Proceedings of the 14th international conference on World Wide Web
Unsupervised named-entity extraction from the web: an experimental study
Artificial Intelligence
Machine Learning
An analysis of the AskMSR question-answering system
EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
Semantic taxonomy induction from heterogenous evidence
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Yago: a core of semantic knowledge
Proceedings of the 16th international conference on World Wide Web
Autonomously semantifying wikipedia
Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Automatically refining the wikipedia infobox ontology
Proceedings of the 17th international conference on World Wide Web
Searching for common sense: populating Cyc™ from the web
AAAI'05 Proceedings of the 20th national conference on Artificial intelligence - Volume 3
Open information extraction from the web
IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Ontology-driven information extraction with ontosyphon
ISWC'06 Proceedings of the 5th international conference on The Semantic Web
Information arbitrage across multi-lingual Wikipedia
Proceedings of the Second ACM International Conference on Web Search and Data Mining
Amplifying community content creation with mixed initiative information extraction
Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Using Wikipedia to bootstrap open information extraction
ACM SIGMOD Record
Is Wikipedia growing a longer tail?
Proceedings of the ACM 2009 international conference on Supporting group work
International Journal of Human-Computer Studies
AAAI'08 Proceedings of the 23rd national conference on Artificial intelligence - Volume 3
Using multiple ontologies in information extraction
Proceedings of the 18th ACM conference on Information and knowledge management
Extracting Enterprise Vocabularies Using Linked Open Data
ISWC '09 Proceedings of the 8th International Semantic Web Conference
Named entity recognition in Wikipedia
People's Web '09 Proceedings of the 2009 Workshop on The People's Web Meets NLP: Collaboratively Constructed Semantic Resources
Ontology-based information extraction: An introduction and a survey of current approaches
Journal of Information Science
Acquisition of instance attributes via labeled and related instances
Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Open information extraction using Wikipedia
ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Learning 5000 relational extractors
ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Machine reading at the University of Washington
FAM-LbR '10 Proceedings of the NAACL HLT 2010 First International Workshop on Formalisms and Methodology for Learning by Reading
Components for information extraction: ontology-based information extractors and generic platforms
CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Extracting structured information from Wikipedia articles to populate infoboxes
CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Constructing reference sets from unstructured, ungrammatical text
Journal of Artificial Intelligence Research
A self-supervised approach for extraction of attribute-value pairs from wikipedia articles
SPIRE'10 Proceedings of the 17th international conference on String processing and information retrieval
Materializing multi-relational databases from the web using taxonomic queries
Proceedings of the fourth ACM international conference on Web search and data mining
Instance sense induction from attribute sets
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Shortipedia aggregating and curating Semantic Web data
Web Semantics: Science, Services and Agents on the World Wide Web
Attribute retrieval from relational web tables
SPIRE'11 Proceedings of the 18th international conference on String processing and information retrieval
Towards a framework for attribute retrieval
Proceedings of the 20th ACM international conference on Information and knowledge management
Selecting actions for resource-bounded information extraction using reinforcement learning
Proceedings of the fifth ACM international conference on Web search and data mining
The role of query sessions in extracting instance attributes from web search queries
ECIR'2010 Proceedings of the 32nd European conference on Advances in Information Retrieval
Ontological parsing of encyclopedia information
CICLing'10 Proceedings of the 11th international conference on Computational Linguistics and Intelligent Text Processing
Resource-Bounded information extraction: acquiring missing feature values on demand
PAKDD'10 Proceedings of the 14th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part I
Identifying constant and unique relations by using time-series text
EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Large-Scale learning of relation-extraction rules with distant supervision from the web
ISWC'12 Proceedings of the 11th international conference on The Semantic Web - Volume Part I
Expert Systems with Applications: An International Journal
A performance comparison of parallel DBMSs and MapReduce on large-scale text analytics
Proceedings of the 16th International Conference on Extending Database Technology
Using natural language to integrate, evaluate, and optimize extracted knowledge bases
Proceedings of the 2013 workshop on Automated knowledge base construction
Aggregated search: A new information retrieval paradigm
ACM Computing Surveys (CSUR)
Guided curation of semistructured data in collaboratively-built knowledge bases
Future Generation Computer Systems
Towards better understanding and utilizing relations in DBpedia
Web Intelligence and Agent Systems
Bricking Semantic Wikipedia by relation population and predicate suggestion
Web Intelligence and Agent Systems
Hi-index | 0.00 |
Not only is Wikipedia a comprehensive source of quality information, it has several kinds of internal structure (e.g., relational summaries known as infoboxes), which enable self-supervised information extraction. While previous efforts at extraction from Wikipedia achieve high precision and recall on well-populated classes of articles, they fail in a larger number of cases, largely because incomplete articles and infrequent use of infoboxes lead to insufficient training data. This paper presents three novel techniques for increasing recall from Wikipedia's long tail of sparse classes: (1) shrinkage over an automatically-learned subsumption taxonomy, (2) a retraining technique for improving the training data, and (3) supplementing results by extracting from the broader Web. Our experiments compare design variations and show that, used in concert, these techniques increase recall by a factor of 1.76 to 8.71 while maintaining or increasing precision.