Computational Methods for Intelligent Information Access
Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Foundations of statistical natural language processing
Foundations of statistical natural language processing
Disambiguation of proper names in text
ANLC '97 Proceedings of the fifth conference on Applied natural language processing
Automatic retrieval and clustering of similar words
COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 2
Entity-based cross-document coreferencing using the Vector Space Model
COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Iterative record linkage for cleaning and integration
Proceedings of the 9th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery
Automatic glossary extraction: beyond terminology identification
COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Is Hillary Rodham Clinton the president?: disambiguating names across documents
CorefApp '99 Proceedings of the Workshop on Coreference and its Applications
Identifying synonyms among distributionally similar words
IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
Ontology-driven automatic entity disambiguation in unstructured text
ISWC'06 Proceedings of the 5th international conference on The Semantic Web
Mining document collections to facilitate accurate approximate entity matching
Proceedings of the VLDB Endowment
Context and Domain Knowledge Enhanced Entity Spotting in Informal Text
ISWC '09 Proceedings of the 8th International Semantic Web Conference
AND '10 Proceedings of the fourth workshop on Analytics for noisy unstructured text data
Tokenizing micro-blogging messages using a text classification approach
AND '10 Proceedings of the fourth workshop on Analytics for noisy unstructured text data
Automated crime report analysis and classification for e-government and decision support
Proceedings of the 14th Annual International Conference on Digital Government Research
Hi-index | 0.00 |
Identification of named entities such as person, organization and product names from text is an important task in information extraction. In many domains, the same entity could be referred to in multiple ways due to variations introduced by different user groups, variations of spellings across regions or cultures, usage of abbreviations, typographical errors and other reasons associated with conventional usage. Identifying a piece of text as a mention of an entity in such noisy data is difficult, even if we have a dictionary of possible entities. Previous approaches treat the synonym problem as part entity disambiguation and use learning-based methods that use the context of the words to identify synonyms. In this paper, we show that existing domain knowledge, encoded as rules, can be used effectively to address the synonym problem to a considerable extent. This makes the disambiguation task simpler, without the need for much training data. We look at a subset of application scenarios in named entity extraction, categorize the possible variations in entity names, and define rules for each category. Using these rules, we generate synonyms for the canonical list and match these synonyms to the actual occurrence in the data sets. In particular, we describe the rule categories that we developed for several named entities and report the results of applying our technique of extracting named entities by generating synonyms for two different domains.