Term-weighting approaches in automatic text retrieval
Information Processing and Management: an International Journal
Foundations of statistical natural language processing
Foundations of statistical natural language processing
Learning dictionaries for information extraction by multi-level bootstrapping
AAAI '99/IAAI '99 Proceedings of the sixteenth national conference on Artificial intelligence and the eleventh Innovative applications of artificial intelligence conference innovative applications of artificial intelligence
Snowball: extracting relations from large plain-text collections
DL '00 Proceedings of the fifth ACM conference on Digital libraries
Geospatial mapping and navigation of the web
Proceedings of the 10th international conference on World Wide Web
Extracting Patterns and Relations from the World Wide Web
WebDB '98 Selected papers from the International Workshop on The World Wide Web and Databases
Dates and times in email messages
Proceedings of the 9th international conference on Intelligent user interfaces
Automatic retrieval and clustering of similar words
COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 2
A bootstrapping method for learning semantic lexicons using extraction pattern contexts
EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
Named entity recognition with a maximum entropy approach
CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
Estimating the Support of a High-Dimensional Distribution
Neural Computation
Efficient query processing in geographic web search engines
Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Event ordering using TERSEO system
Data & Knowledge Engineering - Special issue: Application of natural language to information systems (NLDB04)
Proceedings of the 16th international conference on World Wide Web
The role of documents vs. queries in extracting class attributes from text
Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Leveraging context in user-centric entity detection systems
Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Information Sciences: an International Journal
LIBLINEAR: A Library for Large Linear Classification
The Journal of Machine Learning Research
Contextual Ranking of Keywords Using Click Data
ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
AAAI'06 proceedings of the 21st national conference on Artificial intelligence - Volume 2
Unsupervised named-entity extraction from the Web: An experimental study
Artificial Intelligence
Automatic time expression labeling for english and chinese text
CICLing'05 Proceedings of the 6th international conference on Computational Linguistics and Intelligent Text Processing
On theme location discovery for travelogue services
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Automatic identification of protagonist in fairy tales using verb
PAKDD'12 Proceedings of the 16th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part II
Adaptive context features for toponym resolution in streaming news
SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Extending enterprise service design knowledge using clustering
ICSOC'12 Proceedings of the 10th international conference on Service-Oriented Computing
Hi-index | 0.00 |
Named entity recognition studies the problem of locating and classifying parts of free text into a set of predefined categories. Although extensive research has focused on the detection of person, location and organization entities, there are many other entities of interest, including phone numbers, dates, times and currencies (to name a few examples). We refer to these types of entities as "semi-structured named entities", since they usually follow certain syntactic formats according to some conventions, although their structure is typically not well-defined. Regular expression solutions require significant amount of manual effort and supervised machine learning approaches rely on large sets of labeled training data. Therefore, these approaches do not scale when we need to support many semi-structured entity types in many languages and regions. In this paper, we study this problem and propose a novel three-level bootstrapping framework for the detection of semi-structured entities. We describe the proposed techniques for phone, date and time entities, and perform extensive evaluations on English, German, Polish, Swedish and Turkish documents. Despite the minimal input from the user, our approach can achieve 95% precision and 84% recall for phone entities, and 94% precision and 81% recall for date and time entities, on average. We also discuss implementation details and report run time performance results, which show significant improvements over regular expression based solutions.