Foundations of statistical natural language processing
Foundations of statistical natural language processing
Probabilistic latent semantic indexing
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Hi-index | 0.00 |
The paper defined an information measure associated with a topic or semantics for a keyword based corpus. Firstly, the topic-based corpus was obtained. Then the latent semantic vector space model of the corpus was established. After that, the information measure of the keyword was defined through the vector space model. Accordingly, it could be calculated that the amount of the topic information any document contained. Lastly, the membership degree which measured the degree of membership of the document belonging to the topic was introduced. Set a measurement threshold, thereby it was determined whether the documents belonging to the topic or not. Experiments show that the definition of the information measurement can get over the difficulty of the word-match search and real reach the goal of the Semantic-match search.