Mining the Web: Discovering Knowledge from HyperText Data
Mining the Web: Discovering Knowledge from HyperText Data
Introduction to the special issue on the web as corpus
Computational Linguistics - Special issue on web as corpus
Introduction to the special issue on evaluating word sense disambiguation systems
Natural Language Engineering
P2P-4-DL: Digital Library over Peer-to-Peer
P2P '04 Proceedings of the Fourth International Conference on Peer-to-Peer Computing
Using the web to overcome data sparseness
EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
Blueprint for a high performance NLP infrastructure
SEALTS '03 Proceedings of the HLT-NAACL 2003 workshop on Software engineering and architecture of language technology systems - Volume 8
Parsing the WSJ using CCG and log-linear models
ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Constructing a large scale text corpus based on the grid and trustworthiness
TSD'07 Proceedings of the 10th international conference on Text, speech and dialogue
PackPlay: mining semantic data in collaborative games
LAW IV '10 Proceedings of the Fourth Linguistic Annotation Workshop
Hi-index | 0.00 |
This paper presents a proposal to facilitate the use of the annotated web as corpus by alleviating the annotation bottleneck for corpus data drawn from the web. We describe a framework for large-scale distributed corpus annotation using peer-to-peer (P2P) technology to meet this need. We also propose to annotate a large reference corpus in order to evaluate this framework. This will allow us to investigate the affordances offered by distributed techniques to ensure replicability of linguistic research based on web-derived corpora.