Names: a new frontier in text mining

Authors:
Frankie Patman;Paul Thompson
Affiliations:
Language Analysis Systems, Inc., Herndon, VA;Institute for Security Technology Studies, Dartmouth College, Hanover, NH
Venue:
ISI'03 Proceedings of the 1st NSF/NIJ conference on Intelligence and security informatics
Year:
2003

Citing 17
Cited 4

Integration of information retrieval and database management in support of multimedia police work

Journal of Information Science
Probabilistic Datalog—a logic for powerful retrieval methods

SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
An Algorithm that Learns What‘s in a Name

Machine Learning - Special issue on natural language learning
COPLINK: managing law enforcement data and knowledge

Communications of the ACM
Matchsimile: a flexible approximate matching tool for searching proper names

Journal of the American Society for Information Science and Technology
Improving Precision and Recall for Soundex Retrieval

ITCC '02 Proceedings of the International Conference on Information Technology: Coding and Computing
Kernel methods for relation extraction

The Journal of Machine Learning Research
Automatically detecting deceptive criminal identities

Communications of the ACM - Homeland security
A machine learning approach to coreference resolution of noun phrases

Computational Linguistics - Special issue on computational anaphora resolution
Entity-based cross-document coreferencing using the Vector Space Model

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Message Understanding Conference-6: a brief history

COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 1
Producing biographical summaries: combining linguistic knowledge with corpus statistics

ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
Improving machine learning approaches to coreference resolution

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Ranking algorithms for named-entity extraction: boosting and the voted perceptron

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Coreference resolution with syntactico-semantic rules and corpus statistics

ConLL '01 Proceedings of the 2001 workshop on Computational Natural Language Learning - Volume 7
Is Hillary Rodham Clinton the president?: disambiguating names across documents

CorefApp '99 Proceedings of the Workshop on Coreference and its Applications
Using decision trees for conference resolution

IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 2

A hybrid approach to fuzzy name search incorporating language-based and text-based principles

Journal of Information Science
Evaluation of a graduate level data mining course with industry participants

AusDM '07 Proceedings of the sixth Australasian conference on Data mining and analytics - Volume 70
Analyzing social networks in e-mail with rich syntactic features

ISI'06 Proceedings of the 4th IEEE international conference on Intelligence and Security Informatics
Towards the automation of address identification

Scientometrics

Quantified Score

Hi-index	0.00

Visualization

Abstract

Over the past 15 years the government has funded research in information extraction, with the goal of developing the technology to extract entities, events, and their interrelationships from free text for further analysis. A crucial component of linking entities across documents is the ability to recognize when different name strings are potential references to the same entity. Given the extraordinary range of variation international names can take when rendered in the Roman alphabet, this is a daunting task. This paper surveys existing technologies for name matching and for accomplishing pieces of the cross-document extraction and linking task. It proposes a direction for future work in which existing entity extraction, coreference, and database name matching technologies would be harnessed for cross-document coreference and linking capabilities. The extension of name variant matching to free text will add important text mining functionality for intelligence and security informatics toolkits.