Word association norms, mutual information, and lexicography
Computational Linguistics
Unsupervised Segmentation of Categorical Time Series into Episodes
ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
Phrase Detection in the Wikipedia
Focused Access to XML Documents
Non-contiguous word sequences for information retrieval
MWE '04 Proceedings of the Workshop on Multiword Expressions: Integrating Processing
The design, implementation, and use of the Ngram statistics package
CICLing'03 Proceedings of the 4th international conference on Computational linguistics and intelligent text processing
The role of multi-word units in interactive information retrieval
ECIR'05 Proceedings of the 27th European conference on Advances in Information Retrieval Research
Hi-index | 0.01 |
We combine techniques of XML Mining and Text Mining for the benefit of Information Retrieval. By manipulating the word sequence according to the XML structure of the marked-up text, we strengthen phrase boundaries so that they are more obvious to the algorithms that extract multiword sequences from text. Consequently, the quality of the indexed phrases improves, which has a positive effect on the average precision measured by the INEX 2007 standards.