Domain analysis for reusability
Software reuse: emerging technology
Information retrieval: data structures and algorithms
Information retrieval: data structures and algorithms
The relationship between mechanical indexing, structural linguistics and information retrieval
Journal of Information Science
Subtopic structuring for full-length document access
SIGIR '93 Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval
Highlights: language- and domain-independent automatic indexing terms for abstracting
Journal of the American Society for Information Science
A text filter for the automatic identification of empirical articles
Journal of the American Society for Information Science
Text windows and phrases differing by discipline, location in document, and syntactic structure
Information Processing and Management: an International Journal
An algorithm for term conflation based on tree structures
Journal of the American Society for Information Science and Technology
Software construction using components
Software construction using components
Recognizing text genres with simple metrics using discriminant analysis
COLING '94 Proceedings of the 15th conference on Computational linguistics - Volume 2
Text-level structure of research papers: implications for text-based information processing systems
IRSG'97 Proceedings of the 19th Annual BCS-IRSG conference on Information Retrieval Research
Generating domain representations using a relationship model
Information Systems
ADROIT: automatic discourse relation organizer of internet-based text
AAAI'08 Proceedings of the 23rd national conference on Artificial intelligence - Volume 3
Materiality and oral documents
Proceedings of the 2011 iConference
Rhetorical relations for information retrieval
SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Hi-index | 0.00 |
Researchers in indexing and retrieval systems have been advocating the inclusion of more contextual information to improve results. The proliferation of full-text databases and advances in computer storage capacity have made it possible to carry out text analysis by means of linguistic and extralinguistic knowledge. Since the mid 80s, research has tended to pay more attention to context, giving discourse analysis a more central role. The research presented in this paper aims to check whether discourse variables have an impact on modern information retrieval and classification algorithms. In order to evaluate this hypothesis, a functional framework for information analysis in an automated environment has been proposed, where the n-grams (filtering) and the k-means and Chen's classification algorithms have been tested against sub-collections of documents based on the following discourse variables: "Genre", "Register", "Domain terminology", and "Document structure". The results obtained with the algorithms for the different sub-collections were compared to the MeSH information structure. These demonstrate that n-grams does not appear to have a clear dependence on discourse variables, though the k-means classification algorithm does, but only on domain terminology and document structure, and finally Chen's algorithm has a clear dependence on all of the discourse variables. This information could be used to design better classification algorithms, where discourse variables should be taken into account. Other minor conclusions drawn from these results are also presented.