Ranking Text Documents Based on Conceptual Difficulty Using Term Embedding and Sequential Discourse Cohesion

Authors:
Shoaib Jameel;Wai Lam;Xiaojun Qian
Affiliations:
-;-;-
Venue:
WI-IAT '12 Proceedings of the The 2012 IEEE/WIC/ACM International Joint Conferences on Web Intelligence and Intelligent Agent Technology - Volume 01
Year:
2012

Citing 30
Cited 0

Using linear algebra for intelligent information retrieval

SIAM Review
Statistical Models for Text Segmentation

Machine Learning - Special issue on natural language learning
Data clustering: a review

ACM Computing Surveys (CSUR)
A vector space model for automatic indexing

Communications of the ACM
Domain-specific search strategies for the effective retrieval of healthcare and shopping information

CHI '02 Extended Abstracts on Human Factors in Computing Systems
A taxonomy of web search

ACM SIGIR Forum
Document clustering based on non-negative matrix factorization

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Lexical cohesion computed by thesaural relations as an indicator of the structure of text

Computational Linguistics
A study of smoothing methods for language models applied to information retrieval

ACM Transactions on Information Systems (TOIS)
Automatic recognition of reading levels from user queries

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Biasing web search results for topic familiarity

Proceedings of the 14th ACM international conference on Information and knowledge management
Predicting reading difficulty with statistical language models

Journal of the American Society for Information Science and Technology
Concept-based document readability in domain specific information retrieval

CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Reading level assessment using support vector machines and statistical language models

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
"I know what you did last summer": query logs and user privacy

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Characterizing the influence of domain expertise on web search behavior

Proceedings of the Second ACM International Conference on Web Search and Data Mining
Revisiting readability: a unified framework for predicting text quality

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
An analysis of statistical models and features for reading difficulty prediction

EANL '08 Proceedings of the Third Workshop on Innovative Use of NLP for Building Educational Applications
Domain-specific iterative readability computation

Proceedings of the 10th annual joint conference on Digital libraries
Learning to predict readability using diverse linguistic features

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Quality-biased ranking of web documents

Proceedings of the fourth ACM international conference on Web search and data mining
Toward a semantic granularity model for domain-specific information retrieval

ACM Transactions on Information Systems (TOIS)
Relevant knowledge helps in choosing right teacher: active query selection for ranking adaptation

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Measuring Comprehensibility of Web Pages Based on Link Analysis

WI-IAT '11 Proceedings of the 2011 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology - Volume 01
Personalizing web search results by reading level

Proceedings of the 20th ACM international conference on Information and knowledge management
An unsupervised ranking method based on a technical difficulty terrain

Proceedings of the 20th ACM international conference on Information and knowledge management
Adaptive ranking of search results by considering user's comprehension

Proceedings of the 4th International Conference on Uniquitous Information Management and Communication
Characterizing web content, user interests, and search behavior by reading level and topic

Proceedings of the fifth ACM international conference on Web search and data mining
To each his own: personalized content selection based on text comprehensibility

Proceedings of the fifth ACM international conference on Web search and data mining
An unsupervised technical difficulty ranking model based on conceptual terrain in the latent space

Proceedings of the 12th ACM/IEEE-CS joint conference on Digital Libraries

Quantified Score

Hi-index	0.00

Visualization

Abstract

We propose a novel framework for determining the conceptual difficulty of a domain-specific text document without using any external lexicon. Conceptual difficulty relates to finding the reading difficulty of domain-specific documents. Previous approaches to tackling domain-specific readability problem have heavily relied upon an external lexicon, which limits the scalability to other domains. Our model can be readily applied in domain-specific vertical search engines to re-rank documents according to their conceptual difficulty. We develop an unsupervised and principled approach for computing a term's conceptual difficulty in the latent space. Our approach also considers transitions between the segments generated in sequence. It performs better than the current state-of-the-art comparative methods.