A dynamic language model for speech recognition
HLT '91 Proceedings of the workshop on Speech and Natural Language
A stochastic parts program and noun phrase parser for unrestricted text
ANLC '88 Proceedings of the second conference on Applied natural language processing
Improvements in stochastic language modeling
HLT '91 Proceedings of the workshop on Speech and Natural Language
Word document density and relevance scoring (poster session)
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
A new approach to unsupervised text summarization
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Fast Extraction of Semantic Features from a Latent Semantic Indexed Text Corpus
Neural Processing Letters
Topic Identification in Dynamical Text by Complexity Pursuit
Neural Processing Letters
Collocation Discovery for Optimal Bilingual Lexicon Development
AI '00 Proceedings of the 13th Biennial Conference of the Canadian Society on Computational Studies of Intelligence: Advances in Artificial Intelligence
The diversity-based approach to open-domain text summarization
Information Processing and Management: an International Journal
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Statistical models for topic segmentation
ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
A probabilistic model for Latent Semantic Indexing: Research Articles
Journal of the American Society for Information Science and Technology
Distribution-based pruning of backoff language models
ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
Extracting significant words from corpora for ontology extraction
Proceedings of the 3rd international conference on Knowledge capture
Modeling word burstiness using the Dirichlet distribution
ICML '05 Proceedings of the 22nd international conference on Machine learning
Chinese Word Segmentation and Named Entity Recognition: A Pragmatic Approach
Computational Linguistics
Empirical term weighting and expansion frequency
EMNLP '00 Proceedings of the 2000 Joint SIGDAT conference on Empirical methods in natural language processing and very large corpora: held in conjunction with the 38th Annual Meeting of the Association for Computational Linguistics - Volume 13
Language independent NER using a unified model of internal and contextual evidence
COLING-02 proceedings of the 6th conference on Natural language learning - Volume 20
Combining optimal clustering and Hidden Markov models for extractive summarization
MultiSumQA '03 Proceedings of the ACL 2003 workshop on Multilingual summarization and question answering - Volume 12
Summary in context: Searching versus browsing
ACM Transactions on Information Systems (TOIS)
One story, one flow: Hidden Markov Story Models for multilingual multidocument summarization
ACM Transactions on Speech and Language Processing (TSLP)
Pushing task relevant web links down to the desktop
WIDM '06 Proceedings of the 8th annual ACM international workshop on Web information and data management
Summarizing local context to personalize global web search
CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Incremental hierarchical clustering of text documents
CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Adaptive Chinese word segmentation
ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Bootstrapping without the boot
HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Employing web mining and data fusion to improve weak ad hoc retrieval
Information Processing and Management: an International Journal - Special issue: AIRS2005: Information retrieval research in Asia
Inference and evaluation of the multinomial mixture model for text clustering
Information Processing and Management: an International Journal
Journal of Visual Communication and Image Representation
A study of Poisson query generation model for information retrieval
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Document concept lattice for text understanding and summarization
Information Processing and Management: an International Journal
Discrete data clustering using finite mixture models
Pattern Recognition
Text classification based on multi-word with support vector machine
Knowledge-Based Systems
Web service clustering using text mining techniques
International Journal of Agent-Oriented Software Engineering
Document relevance assessment via term distribution analysis using fourier series expansion
Proceedings of the 9th ACM/IEEE-CS joint conference on Digital libraries
Statistical properties of inter-arrival times distribution in social tagging systems
Proceedings of the 20th ACM conference on Hypertext and hypermedia
Improving Legal Document Summarization Using Graphical Models
Proceedings of the 2006 conference on Legal Knowledge and Information Systems: JURIX 2006: The Nineteenth Annual Conference
An improved hierarchical Bayesian model of language for document classification
COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
A probabilistic framework for automatic term recognition
Intelligent Data Analysis
Terminology mining in social media
Proceedings of the 18th ACM conference on Information and knowledge management
A Bayesian mixture model for term re-occurrence and burstiness
CONLL '05 Proceedings of the Ninth Conference on Computational Natural Language Learning
Word distribution analysis for relevance ranking and query expansion
CICLing'08 Proceedings of the 9th international conference on Computational linguistics and intelligent text processing
The BNB distribution for text modeling
ECIR'08 Proceedings of the IR research, 30th European conference on Advances in information retrieval
Document update summarization using incremental hierarchical clustering
CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Identification of rhetorical roles for segmentation and summarization of a legal judgment
Artificial Intelligence and Law
A K-mixture connective-strength-based approach to automatic text summarisation
International Journal of Intelligent Systems Technologies and Applications
Modeling term proximity for probabilistic information retrieval models
Information Sciences: an International Journal
A technique for improving the performance of naive bayes text classification
WISM'11 Proceedings of the 2011 international conference on Web information systems and mining - Volume Part II
Weighting query terms based on distributional statistics
CLEF'05 Proceedings of the 6th international conference on Cross-Language Evalution Forum: accessing Multilingual Information Repositories
Automatic sentiment classification of product reviews using maximal phrases based analysis
WASSA '11 Proceedings of the 2nd Workshop on Computational Approaches to Subjectivity and Sentiment Analysis
Techniques for improving the performance of naive bayes for text classification
CICLing'05 Proceedings of the 6th international conference on Computational Linguistics and Intelligent Text Processing
CoLIS'05 Proceedings of the 5th international conference on Context: conceptions of Library and Information Sciences
Correlation-based burstiness for logo retrieval
Proceedings of the 20th ACM international conference on Multimedia
WNavis: Navigating Wikipedia semantically with an SNA-based summarization technique
Decision Support Systems
Size matters: finding the most informative set of window lengths
ECML PKDD'12 Proceedings of the 2012 European conference on Machine Learning and Knowledge Discovery in Databases - Volume Part II
Hi-index | 0.00 |
This paper addresses the problem of distribution of words and phrases in text, a problem of great general interest and of importance for many practical applications. The existing models for word distribution present observed sequences of words in text documents as an outcome of some stochastic processes; the corresponding distributions of numbers of word occurrences in the documents are modelled as mixtures of Poisson distributions whose parameter values are fitted to the data. We pursue a linguistically motivated approach to statistical language modelling and use observable text characteristics as model parameters. Multi-word technical terms, intrinsically content entities, are chosen for experimentation. Their occurrence and the occurrence dynamics are investigated using a 100-million word data collection consisting of a variety of about 13,000 technical documents. The derivation of models describing word distribution in text is based on a linguistic interpretation of the process of text formation, with the probabilities of word occurrence being functions of observable and linguistically meaningful text characteristics. The adequacy of the proposed models for the description of actually observed distributions of words and phrases in text is confirmed experimentally. The paper has two focuses: one is modelling of the distributions of content words and phrases among different documents; and another is word occurrence dynamics within documents and estimation of corresponding probabilities. Accordingly, among the application areas for the new modelling paradigm are information retrieval and speech recognition.