Automatic grammar induction and parsing free text: a transformation-based approach
ACL '93 Proceedings of the 31st annual meeting on Association for Computational Linguistics
Counter-training in discovery of semantic patterns
ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Proceedings of the 3rd international conference on Knowledge capture
Chinese Word Segmentation and Named Entity Recognition: A Pragmatic Approach
Computational Linguistics
A semantic approach to IE pattern induction
ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Arabic tokenization, part-of-speech tagging and morphological disambiguation in one fell swoop
ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Crime profiling for the Arabic language using computational linguistic techniques
Information Processing and Management: an International Journal
Hi-index | 0.00 |
An unsupervised learning method, based on corpus linguistics and special language terminology, is described that can extract time-varying information from text streams. The method is shown to be 'language-independent' in that its use leads to sets of regular-expressions that can be used to extract the information in typologically distinct languages like English and Arabic. The method uses the information related to the distribution of N-grams, for automatically extracting 'meaning bearing' patterns of usage in a training corpus. The analysis of an English news wire corpus (1,720,142 tokens) and Arabic news wire corpus (1,720,154 tokens) show encouraging results.