Pattern mining across domain-specific text collections

Authors:
Lee Gillam;Khurshid Ahmad
Affiliations:
Department of Computing, School of Electronics and Physical Sciences, University of Surrey, Guildford, United Kingdom;Department of Computing, School of Electronics and Physical Sciences, University of Surrey, Guildford, United Kingdom
Venue:
MLDM'05 Proceedings of the 4th international conference on Machine Learning and Data Mining in Pattern Recognition
Year:
2005

Citing 13
Cited 3

Foundations of statistical natural language processing

Foundations of statistical natural language processing
Ontology Learning for the Semantic Web

Ontology Learning for the Semantic Web
Explorations in Automatic Thesaurus Discovery

Explorations in Automatic Thesaurus Discovery
Knowledge Acquisition of Predicate Argument Structures from Technical Texts Using Machine Learning: The System ASIUM

EKAW '99 Proceedings of the 11th European Workshop on Knowledge Acquisition, Modeling and Management
Using JessTab to Integrate Protégé and Jess

IEEE Intelligent Systems
Retrieving collocations from text: Xtract

Computational Linguistics - Special issue on using large corpora: I
Towards a workbench for acquisition of domain knowledge from natural language

EACL '95 Proceedings of the seventh conference on European chapter of the Association for Computational Linguistics
Word association norms, mutual information, and lexicography

ACL '89 Proceedings of the 27th annual meeting on Association for Computational Linguistics
Extension of Zipf's law to words and phrases

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Semantic Information Processing

Semantic Information Processing
Corpus-based thesaurus construction for image retrieval in specialist domains

ECIR'03 Proceedings of the 25th European conference on IR research
Visualizing sequences of texts using collocational networks

MLDM'03 Proceedings of the 3rd international conference on Machine learning and data mining in pattern recognition
Self organization of a massive document collection

IEEE Transactions on Neural Networks

Using corpus analysis to inform research into opinion detection in blogs

AusDM '07 Proceedings of the sixth Australasian conference on Data mining and analytics - Volume 70
GUEST EDITORIAL: Intelligent data analysis in medicine-Recent advances

Artificial Intelligence in Medicine
Distributional lexical semantics for stop lists

IRSG'08 Proceedings of the 2008 BCS-IRSG conference on Corpus Profiling

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper discusses a consistency in patterns of language use across domain-specific collections of text. We present a method for the automatic identification of domain-specific keywords – specialist terms – based on comparing language use in scientific domain-specific text collections with language use in texts intended for a more general audience. The method supports automatic production of collocational networks, and of networks of concepts – thesauri, or so-called ontologies. The method involves a novel combination of existing metrics from work in computational linguistics, which can enable extraction, or learning, of these kinds of networks. Creation of ontologies or thesauri is informed by international (ISO) standards in terminology science, and the resulting resource can be used to support a variety of work, including data-mining applications.