Authoritative sources in a hyperlinked environment
Proceedings of the ninth annual ACM-SIAM symposium on Discrete algorithms
Probe, count, and classify: categorizing hidden web databases
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Hierarchically Classifying Documents Using Very Few Words
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Information Extraction: Techniques and Challenges
SCIE '97 International Summer School on Information Extraction: A Multidisciplinary Approach to an Emerging Information Technology
WWW '03 Proceedings of the 12th international conference on World Wide Web
The automated acquisition of topic signatures for text summarization
COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
Extracting context to improve accuracy for HTML content extraction
WWW '05 Special interest tracks and posters of the 14th international conference on World Wide Web
Producing biographical summaries: combining linguistic knowledge with corpus statistics
ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
Is it the right answer?: exploiting web redundancy for Answer Validation
ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Learning to paraphrase: an unsupervised approach using multiple-sequence alignment
NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
An improved extraction pattern representation model for automatic IE pattern acquisition
ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Counter-training in discovery of semantic patterns
ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
A testbed for people searching strategies in the WWW
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Statistical acquisition of content selection rules for natural language generation
EMNLP '03 Proceedings of the 2003 conference on Empirical methods in natural language processing
The Proposition Bank: An Annotated Corpus of Semantic Roles
Computational Linguistics
Discovering relations among named entities from large corpora
ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Question answering using constraint satisfaction: QA-by-Dossier-with-Constraints
ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Preemptive information extraction using unrestricted relation discovery
HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
Automatic creation of domain templates
COLING-ACL '06 Proceedings of the COLING/ACL on Main conference poster sessions
On-demand information extraction
COLING-ACL '06 Proceedings of the COLING/ACL on Main conference poster sessions
A parametric linguistics based approach for cross-lingual web querying
Data & Knowledge Engineering
Content Code Blurring: A New Approach to Content Extraction
DEXA '08 Proceedings of the 2008 19th International Conference on Database and Expert Systems Application
Conceptual equivalence for contrast mining in classification learning
Data & Knowledge Engineering
Structural, transitive and latent models for biographic fact extraction
EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics
The SemEval-2007 WePS evaluation: establishing a benchmark for the web people search task
SemEval '07 Proceedings of the 4th International Workshop on Semantic Evaluations
Learning when training data are costly: the effect of class distribution on tree induction
Journal of Artificial Intelligence Research
Open information extraction from the web
IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Automatically generating Wikipedia articles: a structure-aware approach
ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1
The Journal of Machine Learning Research
The Journal of Machine Learning Research
Seven principles for selecting software packages
Communications of the ACM
TCAM Razor: a systematic approach towards minimizing packet classifiers in TCAMs
IEEE/ACM Transactions on Networking (TON)
Information based data anonymization for classification utility
Data & Knowledge Engineering
Hi-index | 0.00 |
Dealing with biographical information (e.g., biography generation, answering biography-related questions, etc.) requires the identification of important activities in the life of the individual in question. While there are activities that can be used in any biography (e.g., person was born on a particular date, person lived in a particular location, etc.), many activities used in biographies tend to be occupation-related, others are person-specific. Hence, occupation gives important clues as to which activities should be included in the biography. In this paper, we present a methodology for identifying a three-level hierarchy of biographical activities: those activities that apply to the general population, those activities that are occupation-related, and those activities that are person-specific. We use the obtained occupation-related activities as features for a multi-class SVM classifier to identify the occupation of a previously unseen individual. We also show that the activities automatically obtained from text can be used as features not only for a classification task but for a clustering task as well. We show that, given the correct number of clusters, people belonging to the same occupation are clustered together. At the same time, clustering people into a smaller number of classes allows the grouping of practitioners of the occupations that share a considerable number of occupation-related activities. Thus, analyzing descriptions of people belonging to various occupations, we can build a hierarchy of occupations.