Spatial Analysis of News Sources
IEEE Transactions on Visualization and Computer Graphics
Concordance-Based Entity-Oriented Search
WI '07 Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence
Expanding network communities from representative examples
ACM Transactions on Knowledge Discovery from Data (TKDD)
Identifying co-referential names across large corpora
CPM'06 Proceedings of the 17th Annual conference on Combinatorial Pattern Matching
Lydia: a system for large-scale news analysis
SPIRE'05 Proceedings of the 12th international conference on String Processing and Information Retrieval
Identifying Differences in News Coverage between Cultural/Ethnic Groups
WI-IAT '09 Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology - Volume 03
Hi-index | 0.00 |
The problem of ethnicity identification from names has a variety of important applications, including biomedical research, demographic studies, and marketing. Here we report on the development of an ethnicity classifier where all training data is extracted from public, non-confidential (and hence somewhat unreliable) sources. Our classifier uses hidden Markov models (HMMs) and decision trees to classify names into 13 cultural/ethnic groups with individual group accuracy comparable accuracy to earlier binary (e.g., Spanish/non-Spanish) classifiers. We have applied this classifier to over 20 million names from a large-scale news corpus, identifying interesting temporal and spatial trends on the representation of particular cultural/ethnic groups.