Making large-scale support vector machine learning practical
Advances in kernel methods
Learning Information Extraction Rules for Semi-Structured and Free Text
Machine Learning - Special issue on natural language learning
Data integration using similarity joins and a word-based information representation language
ACM Transactions on Information Systems (TOIS)
Automatic segmentation of text into structured records
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Hierarchical Wrapper Induction for Semistructured Information Sources
Autonomous Agents and Multi-Agent Systems
IEEE Intelligent Systems
MnM: Ontology Driven Semi-automatic and Automatic Support for Semantic Markup
EKAW '02 Proceedings of the 13th International Conference on Knowledge Engineering and Knowledge Management. Ontologies and the Semantic Web
S-CREAM - Semi-automatic CREAtion of Metadata
EKAW '02 Proceedings of the 13th International Conference on Knowledge Engineering and Knowledge Management. Ontologies and the Semantic Web
Robust and efficient fuzzy match for online data cleaning
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Adaptive duplicate detection using learnable string similarity measures
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Towards the self-annotating web
Proceedings of the 13th international conference on World Wide Web
Mining reference tables for automatic text segmentation
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Support vector machine learning for interdependent and structured output spaces
ICML '04 Proceedings of the twenty-first international conference on Machine learning
UAI '04 Proceedings of the 20th conference on Uncertainty in artificial intelligence
Automatically utilizing secondary sources to align information across sources
AI Magazine - Special issue on semantic integration
Adaptive information extraction from text by rule induction and generalisation
IJCAI'01 Proceedings of the 17th international joint conference on Artificial intelligence - Volume 2
Modeling biomedical assertions in the semantic web
Proceedings of the 2007 ACM symposium on Applied computing
Foundations and Trends in Databases
Phoebus: a system for extracting and integrating data from unstructured and ungrammatical sources
AAAI'06 proceedings of the 21st national conference on Artificial intelligence - Volume 2
Creating relational data from unstructured and ungrammatical data sources
Journal of Artificial Intelligence Research
Generalized expectation criteria for bootstrapping extractors using record-text alignment
EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1 - Volume 1
A case-based service request interpretation approach for digital homes
SMC'09 Proceedings of the 2009 IEEE international conference on Systems, Man and Cybernetics
Constructing reference sets from unstructured, ungrammatical text
Journal of Artificial Intelligence Research
The effect of noise in automatic text classification
Proceedings of the International Conference & Workshop on Emerging Trends in Technology
Semantic annotation using ontology and bayesian networks
AI'10 Proceedings of the 23rd Canadian conference on Advances in Artificial Intelligence
Building a lightweight semantic model for unsupervised information extraction on short listings
EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Finding email correspondents in online social networks
World Wide Web
Hi-index | 0.00 |
There are vast amounts of free text on the internet that are neither grammatical nor formally structured, such as item descriptions on Ebay or internet classifieds like Craig's list. These sources of data, called "posts," are full of useful information for agents scouring the Semantic Web, but they lack the semantic annotation to make them searchable. Annotating these posts is difficult since the text generally exhibits little formal grammar and the structure of the posts varies. However, by leveraging collections of known entities and their common attributes, called "reference sets," we can annotate these posts despite their lack of grammar and structure. To use this reference data, we align a post to a member of the reference set, and then exploit this matched member during information extraction. We compare this extraction approach to more traditional information extraction methods that rely on structural and grammatical characteristics, and we show that our approach outperforms traditional methods on this type of data.