Wrapper induction: efficiency and expressiveness
Artificial Intelligence - Special issue on Intelligent internet systems
ACM SIGKDD Explorations Newsletter
Mining the Web's Link Structure
Computer
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Editorial: special issue on web content mining
ACM SIGKDD Explorations Newsletter
Gossip Galore: a self-learning agent for exchanging pop trivia
EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics: Demonstrations Session
The SemEval-2007 WePS evaluation: establishing a benchmark for the web people search task
SemEval '07 Proceedings of the 4th International Workshop on Semantic Evaluations
CU-COMSEM: exploring rich features for unsupervised web personal name disambiguation
SemEval '07 Proceedings of the 4th International Workshop on Semantic Evaluations
PSNUS: web people name disambiguation by simple clustering with rich features
SemEval '07 Proceedings of the 4th International Workshop on Semantic Evaluations
Unsupervised named-entity extraction from the Web: An experimental study
Artificial Intelligence
The WEKA data mining software: an update
ACM SIGKDD Explorations Newsletter
Domain adaptation of rule-based annotators for named-entity recognition tasks
EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Hi-index | 0.00 |
We present a web mining system that clusters persons sharing the same name and also extracts bibliographical information about them. The input of our system is the result of web search engine queries in English or in Hungarian. For system evaluation in English, our system (RGAI) participated in the third Web People Search Task challenge [1]. The chief characteristics of our approach compared to the others are that we focus on the raw textual parts of the web pages instead of the structured parts, we group similar attribute classes together and we explicitly handle their interdependencies. The RGAI system achieved top results on the person attribute extraction subtask, and average results on the person clustering subtask. Following the shared task annotation principles, we also manually constructed a Hungarian person disambiguation corpus and adapted our system from English to Hungarian. We present experimental results on this as well.