Semantic annotation of unstructured and ungrammatical text

Authors:
Matthew Michelson;Craig A. Knoblock
Affiliations:
University of Southern California, Information Sciences Institute, Marina del Rey, CA;University of Southern California, Information Sciences Institute, Marina del Rey, CA
Venue:
IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence
Year:
2005

Citing 17
Cited 11

Making large-scale support vector machine learning practical

Advances in kernel methods
Learning Information Extraction Rules for Semi-Structured and Free Text

Machine Learning - Special issue on natural language learning
Data integration using similarity joins and a word-based information representation language

ACM Transactions on Information Systems (TOIS)
Automatic segmentation of text into structured records

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Hierarchical Wrapper Induction for Semistructured Information Sources

Autonomous Agents and Multi-Agent Systems
Agents and the Semantic Web

IEEE Intelligent Systems
MnM: Ontology Driven Semi-automatic and Automatic Support for Semantic Markup

EKAW '02 Proceedings of the 13th International Conference on Knowledge Engineering and Knowledge Management. Ontologies and the Semantic Web
S-CREAM - Semi-automatic CREAtion of Metadata

EKAW '02 Proceedings of the 13th International Conference on Knowledge Engineering and Knowledge Management. Ontologies and the Semantic Web
Robust and efficient fuzzy match for online data cleaning

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Adaptive duplicate detection using learnable string similarity measures

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Towards the self-annotating web

Proceedings of the 13th international conference on World Wide Web
Mining reference tables for automatic text segmentation

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Exploiting dictionaries in named entity extraction: combining semi-Markov extraction processes and data integration methods

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Support vector machine learning for interdependent and structured output spaces

ICML '04 Proceedings of the twenty-first international conference on Machine learning
An integrated, conditional model of information extraction and coreference with application to citation matching

UAI '04 Proceedings of the 20th conference on Uncertainty in artificial intelligence
Automatically utilizing secondary sources to align information across sources

AI Magazine - Special issue on semantic integration
Adaptive information extraction from text by rule induction and generalisation

IJCAI'01 Proceedings of the 17th international joint conference on Artificial intelligence - Volume 2

Modeling biomedical assertions in the semantic web

Proceedings of the 2007 ACM symposium on Applied computing
Information Extraction

Foundations and Trends in Databases
Phoebus: a system for extracting and integrating data from unstructured and ungrammatical sources

AAAI'06 proceedings of the 21st national conference on Artificial intelligence - Volume 2
Creating relational data from unstructured and ungrammatical data sources

Journal of Artificial Intelligence Research
Generalized expectation criteria for bootstrapping extractors using record-text alignment

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1 - Volume 1
A case-based service request interpretation approach for digital homes

SMC'09 Proceedings of the 2009 IEEE international conference on Systems, Man and Cybernetics
Constructing reference sets from unstructured, ungrammatical text

Journal of Artificial Intelligence Research
The effect of noise in automatic text classification

Proceedings of the International Conference & Workshop on Emerging Trends in Technology
Semantic annotation using ontology and bayesian networks

AI'10 Proceedings of the 23rd Canadian conference on Advances in Artificial Intelligence
Building a lightweight semantic model for unsupervised information extraction on short listings

EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Finding email correspondents in online social networks

World Wide Web

Quantified Score

Hi-index	0.00

Visualization

Abstract

There are vast amounts of free text on the internet that are neither grammatical nor formally structured, such as item descriptions on Ebay or internet classifieds like Craig's list. These sources of data, called "posts," are full of useful information for agents scouring the Semantic Web, but they lack the semantic annotation to make them searchable. Annotating these posts is difficult since the text generally exhibits little formal grammar and the structure of the posts varies. However, by leveraging collections of known entities and their common attributes, called "reference sets," we can annotate these posts despite their lack of grammar and structure. To use this reference data, we align a post to a member of the reference set, and then exploit this matched member during information extraction. We compare this extraction approach to more traditional information extraction methods that rely on structural and grammatical characteristics, and we show that our approach outperforms traditional methods on this type of data.