Ethnographically-informed systems design for air traffic control
CSCW '92 Proceedings of the 1992 ACM conference on Computer-supported cooperative work
Accurate methods for the statistics of surprise and coincidence
Computational Linguistics - Special issue on using large corpora: I
REVERE: Support for Requirements Synthesis from Documents
Information Systems Frontiers
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Free Riding on Gnutella Revisited: The Bell Tolls?
IEEE Distributed Systems Online
Deriving wishlists from blogs show us your blog, and we'll tell you what books to buy
Proceedings of the 15th international conference on World Wide Web
Using syntactic information to extract relevant terms for multi-document summarization
COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Why we twitter: understanding microblogging usage and communities
Proceedings of the 9th WebKDD and 1st SNA-KDD 2007 workshop on Web mining and social network analysis
A flexible framework to experiment with ontology learning techniques
Knowledge-Based Systems
A Collection of Comparable Corpora for Under-resourced Languages
Proceedings of the 2010 conference on Human Language Technologies -- The Baltic Perspective: Proceedings of the Fourth International Conference Baltic HLT 2010
Hi-index | 0.00 |
This paper describes a method of comparing corpora which uses frequency profiling. The method can be used to discover key words in the corpora which differentiate one corpus from another. Using annotated corpora, it can be applied to discover key grammatical or word-sense categories. This can be used as a quick way in to find the differences between the corpora and is shown to have applications in the study of social differentiation in the use of English vocabulary, profiling of learner English and document analysis in the software engineering process.