Query evaluation techniques for large databases
ACM Computing Surveys (CSUR)
Fast training of support vector machines using sequential minimal optimization
Advances in kernel methods
Space/time trade-offs in hash coding with allowable errors
Communications of the ACM
Flexible pattern matching in strings: practical on-line search algorithms for texts and biological sequences
TEG: a hybrid approach to information extraction
Proceedings of the thirteenth ACM international conference on Information and knowledge management
A search engine for natural language applications
WWW '05 Proceedings of the 14th international conference on World Wide Web
What's hot and what's not: tracking most frequent items dynamically
ACM Transactions on Database Systems (TODS) - Special Issue: SIGMOD/PODS 2003
Named entity recognition using an HMM-based chunk tagger
ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
An improved data stream summary: the count-min sketch and its applications
Journal of Algorithms
CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
Efficient Batch Top-k Search for Dictionary-based Entity Recognition
ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
To search or to crawl?: towards a query optimizer for text-centric tasks
Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Reducing the human overhead in text categorization
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
A mixture model for contextual text mining
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Show me the money!: deriving the pricing power of product features by mining consumer reviews
Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Open information extraction from the web
IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
A probabilistic model of redundancy in information extraction
IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence
Self-supervised relation extraction from the web
ISMIS'06 Proceedings of the 16th international conference on Foundations of Intelligent Systems
Exploiting web search to generate synonyms for entities
Proceedings of the 18th international conference on World wide web
Fine-grained classification of named entities exploiting latent semantic kernels
CoNLL '09 Proceedings of the Thirteenth Conference on Computational Natural Language Learning
Query portals: dynamically generating portals for entity-oriented web queries
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Enhancing the open-domain classification of named entity using linked open data
ISWC'10 Proceedings of the 9th international semantic web conference on The semantic web - Volume Part I
News personalization using enhanced term: document frequency (ETF-IDF) classification method
Proceedings of the International Conference & Workshop on Emerging Trends in Technology
APOLLO: a general framework for populating ontology with named entities via random walks on graphs
Proceedings of the 21st international conference companion on World Wide Web
A graph-based approach for ontology population with named entities
Proceedings of the 21st ACM international conference on Information and knowledge management
Entity discovery and annotation in tables
Proceedings of the 16th International Conference on Extending Database Technology
Hi-index | 0.00 |
Extracting entities (such as people, movies) from documents and identifying the categories (such as painter, writer) they belong to enable structured querying and data analysis over unstructured document collections. In this paper, we focus on the problem of categorizing extracted entities. Most prior approaches developed for this task only analyzed the local document context within which entities occur. In this paper, we significantly improve the accuracy of entity categorization by (i) considering an entity's context across multiple documents containing it, and (ii) exploiting existing large lists of related entities (e.g., lists of actors, directors, books). These approaches introduce computational challenges because (a) the context of entities has to be aggregated across several documents and (b) the lists of related entities may be very large. We develop techniques to address these challenges. We present a thorough experimental study on real data sets that demonstrates the increase in accuracy and the scalability of our approaches.