Editorial: Occupation inference through detection and classification of biographical activities

Authors:
Elena Filatova;John Prager
Affiliations:
Department of Computer and Information Sciences, Fordham University, 441 East Fordham Road, Bronx NY 10458, United States;IBM T.J. Watson, Research Center, Yorktown Heights, NY 10598, United States
Venue:
Data & Knowledge Engineering
Year:
2012

Citing 34
Cited 0

Authoritative sources in a hyperlinked environment

Proceedings of the ninth annual ACM-SIAM symposium on Discrete algorithms
Probe, count, and classify: categorizing hidden web databases

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Hierarchically Classifying Documents Using Very Few Words

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Information Extraction: Techniques and Challenges

SCIE '97 International Summer School on Information Extraction: A Multidisciplinary Approach to an Emerging Information Technology
Semantic search

WWW '03 Proceedings of the 12th international conference on World Wide Web
The automated acquisition of topic signatures for text summarization

COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
Extracting context to improve accuracy for HTML content extraction

WWW '05 Special interest tracks and posters of the 14th international conference on World Wide Web
Producing biographical summaries: combining linguistic knowledge with corpus statistics

ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
Is it the right answer?: exploiting web redundancy for Answer Validation

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Learning to paraphrase: an unsupervised approach using multiple-sequence alignment

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
An improved extraction pattern representation model for automatic IE pattern acquisition

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Counter-training in discovery of semantic patterns

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
A testbed for people searching strategies in the WWW

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Statistical acquisition of content selection rules for natural language generation

EMNLP '03 Proceedings of the 2003 conference on Empirical methods in natural language processing
The Proposition Bank: An Annotated Corpus of Semantic Roles

Computational Linguistics
Discovering relations among named entities from large corpora

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Question answering using constraint satisfaction: QA-by-Dossier-with-Constraints

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Tell me what you do and I'll tell you what you are: learning occupation-related activities for biographies

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Preemptive information extraction using unrestricted relation discovery

HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
Automatic creation of domain templates

COLING-ACL '06 Proceedings of the COLING/ACL on Main conference poster sessions
On-demand information extraction

COLING-ACL '06 Proceedings of the COLING/ACL on Main conference poster sessions
A parametric linguistics based approach for cross-lingual web querying

Data & Knowledge Engineering
Content Code Blurring: A New Approach to Content Extraction

DEXA '08 Proceedings of the 2008 19th International Conference on Database and Expert Systems Application
Conceptual equivalence for contrast mining in classification learning

Data & Knowledge Engineering
Structural, transitive and latent models for biographic fact extraction

EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics
The SemEval-2007 WePS evaluation: establishing a benchmark for the web people search task

SemEval '07 Proceedings of the 4th International Workshop on Semantic Evaluations
Learning when training data are costly: the effect of class distribution on tree induction

Journal of Artificial Intelligence Research
Open information extraction from the web

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Automatically generating Wikipedia articles: a structure-aware approach

ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1
Local Causal and Markov Blanket Induction for Causal Discovery and Feature Selection for Classification Part I: Algorithms and Empirical Evaluation

The Journal of Machine Learning Research
Local Causal and Markov Blanket Induction for Causal Discovery and Feature Selection for Classification Part II: Analysis and Extensions

The Journal of Machine Learning Research
Seven principles for selecting software packages

Communications of the ACM
TCAM Razor: a systematic approach towards minimizing packet classifiers in TCAMs

IEEE/ACM Transactions on Networking (TON)
Information based data anonymization for classification utility

Data & Knowledge Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

Dealing with biographical information (e.g., biography generation, answering biography-related questions, etc.) requires the identification of important activities in the life of the individual in question. While there are activities that can be used in any biography (e.g., person was born on a particular date, person lived in a particular location, etc.), many activities used in biographies tend to be occupation-related, others are person-specific. Hence, occupation gives important clues as to which activities should be included in the biography. In this paper, we present a methodology for identifying a three-level hierarchy of biographical activities: those activities that apply to the general population, those activities that are occupation-related, and those activities that are person-specific. We use the obtained occupation-related activities as features for a multi-class SVM classifier to identify the occupation of a previously unseen individual. We also show that the activities automatically obtained from text can be used as features not only for a classification task but for a clustering task as well. We show that, given the correct number of clusters, people belonging to the same occupation are clustered together. At the same time, clustering people into a smaller number of classes allows the grouping of practitioners of the occupations that share a considerable number of occupation-related activities. Thus, analyzing descriptions of people belonging to various occupations, we can build a hierarchy of occupations.