Accurate methods for the statistics of surprise and coincidence
Computational Linguistics - Special issue on using large corpora: I
From grammar to lexicon: unsupervised learning of lexical syntax
Computational Linguistics - Special issue on using large corpora: II
Clustering verbs semantically according to their alternation behaviour
COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 2
Using predicate-argument structures for information extraction
ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Large-Scale Induction and Evaluation of Lexical Resources from the Penn-II and Penn-III Treebanks
Computational Linguistics
Statistical filtering and subcategorization frame acquisition
EMNLP '00 Proceedings of the 2000 Joint SIGDAT conference on Empirical methods in natural language processing and very large corpora: held in conjunction with the 38th Annual Meeting of the Association for Computational Linguistics - Volume 13
IJCNLP'04 Proceedings of the First international joint conference on Natural Language Processing
Hi-index | 0.00 |
IRASubcat is a language-independent tool to acquire information about the subcategorization of verbs from corpus. The tool can extract information from corpora annotated at various levels, including almost raw text, where only verbs are identified. It can also aggregate information from a pre-existing lexicon with verbal subcategorization information. The system is highly customizable, and works with XML as input and output format. IRASubcat identifies patterns of constituents in the corpus, and associates patterns with verbs if their association strength is over a frequency threshold and passes the likelihood ratio hypothesis test. It also implements a procedure to identify verbal constituents that could be playing the role of an adjunct in a pattern. Thresholds controlling frequency and identification of adjuncts can be customized by the user, or else they are given a default value.