Knowledge Discovery in Grammatically Analysed Corpora

Authors:
Sean Wallis;Gerald Nelson
Affiliations:
University of HongKong, Department of English, HongKong. s.wallis@ucl.ac.uk;Survey of English Usage, University College, London, UK. ganelson@hkucc.hk
Venue:
Data Mining and Knowledge Discovery
Year:
2001

Citing 13
Cited 4

On machine intelligence (2nd revised ed.)

On machine intelligence (2nd revised ed.)
Learning decision rules in noisy domains

Proceedings of Expert Systems '86, The 6Th Annual Technical Conference on Research and development in expert systems III
C4.5: programs for machine learning

C4.5: programs for machine learning
Laddering: technique and tool use in knowledge acquisition

Knowledge Acquisition
Knowledge acquisition from databases

Knowledge acquisition from databases
Editorial

Data Mining and Knowledge Discovery
The CN2 Induction Algorithm

Machine Learning
Syntactic Parsing as a Knowledge Acquisition Problem

EKAW '97 Proceedings of the 10th European Workshop on Knowledge Acquisition, Modeling and Management
Knowledge Discovery in Databases: Exploiting Knowledge-Level Redescription

EKAW '96 Proceedings of the 9th European Knowledge Acquisition Workshop on Advances in Knowledge Acquisition
Inductive Logic Programming for Natural Language Processing

ILP '96 Selected Papers from the 6th International Workshop on Inductive Logic Programming
A simple rule-based part of speech tagger

ANLC '92 Proceedings of the third conference on Applied natural language processing
The Penn Treebank: annotating predicate argument structure

HLT '94 Proceedings of the workshop on Human Language Technology
Induction of first-order decision lists: results on learning the past tense of English verbs

Journal of Artificial Intelligence Research

Semantic Role Parsing: Adding Semantic Structure to Unstructured Text

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Support Vector Learning for Semantic Argument Classification

Machine Learning
Language pattern analysis for automotive natural language speech applications

Proceedings of the 2nd International Conference on Automotive User Interfaces and Interactive Vehicular Applications
Topics as contextual indicators for word choice in SMS conversations

SIGDIAL '11 Proceedings of the SIGDIAL 2011 Conference

Quantified Score

Hi-index	0.00

Visualization

Abstract

Collections of grammatically annotated texts (corpora), and in particular, iparsed corpora, present a challenge to current methods of analysis. Such corpora are large and highly structured heterogeneous data sources. In this paper we briefly describe the parsed one-million word ICE-GB corpus, and the ICECUP query system. We then consider the application of iknowledge discovery in databases (KDD) to text corpora. Following Cupit and Shadbolt (Proceedings 9th European Knowledge Acquisition Workshop, EKAW '96; Berlin: Springer Verlag, pp. 245–261, 1996), we argue that effective linguistic knowledge discovery must be based on a process of iredescription or, more precisely, iabstraction, based on the research question to be investigated. Abstraction maps relevant elements from the corpus to an abstract model of the research topic. This mapping may be implemented using a grammatical query representation such as ICECUP's iFuzzy Tree Fragments (FTFs). Since this abstractive process must be both experimental and expert-guided, ultimately a workbench is necessary to maintain, evaluate and refine the abstract model. We conclude with a pilot study, employing our approach, into aspects of noun phrase postmodifying clause structure. The data is analysed using the UNIT machine learning algorithm to search for significant interactions between domain variables. We show that our results are commensurable with those published in the linguistics literature, and discuss how the methodology may be improved.