Exploration of document relation quality with consideration of term representation basis, term weighting and association measure

Authors:
Nichnan Kittiphattanabawon;Thanaruk Theeramunkong;Ekawit Nantajeewarawat
Affiliations:
Sirindhorn International Institute of Technology, Thammasat University, Thailand;Sirindhorn International Institute of Technology, Thammasat University, Thailand;Sirindhorn International Institute of Technology, Thammasat University, Thailand
Venue:
PAISI'10 Proceedings of the 2010 Pacific Asia conference on Intelligence and Security Informatics
Year:
2010

Citing 14
Cited 1

Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach

Data Mining and Knowledge Discovery
Evaluating adaptive user profiles for news classification

Proceedings of the 9th international conference on Intelligent user interfaces
Efficient Algorithms for Mining Closed Itemsets and Their Lattice Structure

IEEE Transactions on Knowledge and Data Engineering
Towards practical genre classification of web documents

Proceedings of the 15th international conference on World Wide Web
Linguistic correlates of style: authorship classification with deep linguistic analysis features

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
A Novel Document Analysis Method Using Compressibility Vector

ISDPE '07 Proceedings of the The First International Symposium on Data, Privacy, and E-Commerce
Discovering relationships among categories using misclassification information

Proceedings of the 2008 ACM symposium on Applied computing
Storyline-based summarization for news topic retrospection

Decision Support Systems
Comparing Rule Measures for Predictive Association Rules

ECML '07 Proceedings of the 18th European conference on Machine Learning
Text Document Clustering Based on the Modifying Relations

CSSE '08 Proceedings of the 2008 International Conference on Computer Science and Software Engineering - Volume 01
Quality Evaluation for Document Relation Discovery Using Citation Information

IEICE - Transactions on Information and Systems
Relation Discovery from Thai News Articles Using Association Rule Mining

PAISI '09 Proceedings of the Pacific Asia Workshop on Intelligence and Security Informatics
Personalized news categorization through scalable text classification

APWeb'06 Proceedings of the 8th Asia-Pacific Web conference on Frontiers of WWW Research and Development

Inclusion-based and exclusion-based approaches in graph-based multiple news summarization

KICSS'10 Proceedings of the 5th international conference on Knowledge, information, and creativity support systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Tracking and relating news articles from several sources can play against misinformation from deceptive news stories since single source can not judge whether the information is a truth or not. Preventing misinformation in a computer system is an interesting research in intelligence and security informatics. For this task, association rule mining has been recently applied due to its performance and scalability. This paper presents an exploration on how term representation basis, term weighting and association measure affect the quality of relations discovered among news articles from several sources. Twenty four combinations initiated by two term representation bases, four term weightings, and three association measures are explored with their results compared to human judgement. A number of evaluations are conducted to compare each combination’s performance to the others’ with regard to top-k ranks. The experimental results indicate that a combination of bigram (BG), term frequency with inverse document frequency (TFIDF) and confidence (CONF), as well as a combination of BG, TFIDF and conviction (CONV), achieves the best performance to find the related documents by placing them in upper ranks with 0.41% rank-order mismatch on top-50 mined relations. However, a combination of unigram (UG), TFIDF and lift (LIFT) performs the best by locating irrelevant relations in lower ranks (top-1100) with rank-order mismatch of 9.63 %.