Term-weighting approaches in automatic text retrieval
Information Processing and Management: an International Journal
Recent trends in hierarchic document clustering: a critical review
Information Processing and Management: an International Journal
The effectiveness of a nonsyntatic approach to automatic phrase indexing for document retrieval
Journal of the American Society for Information Science
A critical investigation of recall and precision as measures of retrieval system performance
ACM Transactions on Information Systems (TOIS)
Predicting the performance of linearly combined IR systems
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Statistical phrases for vector-space information retrieval (poster abstract)
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Retrieving collocations from text: Xtract
Computational Linguistics - Special issue on using large corpora: I
Word association norms, mutual information, and lexicography
ACL '89 Proceedings of the 27th annual meeting on Association for Computational Linguistics
Multiword unit hybrid extraction
MWE '03 Proceedings of the ACL 2003 workshop on Multiword expressions: analysis, acquisition and treatment - Volume 18
User-chosen phrases in interactive query formulation for information retrieval
IRSG'98 Proceedings of the 20th Annual BCS-IRSG conference on Information Retrieval Research
Text document clustering based on frequent word meaning sequences
Data & Knowledge Engineering
XML-aided phrase indexing for hypertext documents
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Geometric algebra rotors for sub-symbolic coding of natural language sentences
KES'07/WIRN'07 Proceedings of the 11th international conference, KES 2007 and XVII Italian workshop on neural networks conference on Knowledge-based intelligent information and engineering systems: Part I
Building a topic hierarchy using the bag-of-related-words representation
Proceedings of the 11th ACM symposium on Document engineering
Extraction of multi-word expressions from small parallel corpora
Natural Language Engineering
Hi-index | 0.00 |
The growing amount of textual information available electronically has increased the need for high performance retrieval. The use of phrases was long seen as a natural way to improve retrieval performance over the common document models that ignore the sequential aspect of word occurrences in documents, considering them as "bags of words". However, both statistical and syntactical phrases showed disappointing results for large document collections. In this paper we present a recent type of multi-word expressions in the form of Maximal Frequent Sequences (Ahonen-Myka, 1999). Mined phrases rather than statistical or syntactical phrases, their main strengths are to form a very compact index and to account for the sequentiality and adjacency of meaningful word co-occurrences, by allowing for a gap between words. We introduce a method for using these phrases in information retrieval and present our experiments. They show a clear improvement over the well-known technique of extracting frequent word pairs.