Self-organization and associative memory: 3rd edition
Self-organization and associative memory: 3rd edition
n-Grams and their implication to natural language understanding
Pattern Recognition
Mastering regular expressions
Statistical methods for speech recognition
Statistical methods for speech recognition
Foundations of statistical natural language processing
Foundations of statistical natural language processing
Programming Techniques: Regular expression search algorithm
Communications of the ACM
Introduction to Modern Information Retrieval
Introduction to Modern Information Retrieval
Data Mining and Knowledge Discovery
Comparison of character-level and part of speech features for name recognition in biomedical texts
Journal of Biomedical Informatics - Special issue: Named entity recognition in biomedicine
Feature-rich part-of-speech tagging with a cyclic dependency network
NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Enriching the knowledge sources used in a maximum entropy part-of-speech tagger
EMNLP '00 Proceedings of the 2000 Joint SIGDAT conference on Empirical methods in natural language processing and very large corpora: held in conjunction with the 38th Annual Meeting of the Association for Computational Linguistics - Volume 13
Text mining without document context
Information Processing and Management: an International Journal - Special issue: Informetrics
Incorporating non-local information into information extraction systems by Gibbs sampling
ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Intrusion detection in web applications using text mining
Engineering Applications of Artificial Intelligence
Combining fuzzy AHP with MDS in identifying the preference similarity of alternatives
Applied Soft Computing
Text document clustering based on frequent word meaning sequences
Data & Knowledge Engineering
IEEE Transactions on Knowledge and Data Engineering
Using the self organizing map for clustering of text documents
Expert Systems with Applications: An International Journal
Automatic generation of semantically enriched web pages by a text mining approach
Expert Systems with Applications: An International Journal
@Note: A workbench for Biomedical Text Mining
Journal of Biomedical Informatics
Text-mining approach to evaluate terms for ontology development
Journal of Biomedical Informatics
A text mining approach for automatic construction of hypertexts
Expert Systems with Applications: An International Journal
Data Mining: Practical Machine Learning Tools and Techniques
Data Mining: Practical Machine Learning Tools and Techniques
Expert Systems with Applications: An International Journal
Hi-index | 12.05 |
In this article, we discuss a number of methods and tools to cluster a 7000 document inventory in order to evaluate the impact of EU funded research in social sciences and humanities on EU policies. The inventory, which is not publicly available, but provided to us by the European Union (EU) in the framework of an EU project, could be divided into three main categories: research documents, influential policy documents, and policy documents. To represent the results in a way that non-experts could make use of it, we explored and compared two visualisation techniques, multi-dimensional scaling (MDS) and the self-organising map (SOM), and one of the latter's derivatives, the U-matrix. Contrary to most other approaches, which perform text analyses only on document titles and abstracts, we performed a full text analysis on more than 300,000 pages in total. Due to the inability of many software suites to handle text mining problems of this size, we developed our own analysis platform. We show that the combination of a U-matrix and an MDS map, which is rarely performed in the domain of text mining, reveals information that would go unnoticed otherwise. Furthermore, we show that the combination of a database, to store the data and the (intermediate) results, and a webserver, to visualise the results, offers a powerful platform to analyse the data and share the results with all participants/collaborators involved in a data- and computation intensive EU-project, thereby guaranteeing both data- and result consistency.