A proposal for a formal model of objects
Object-oriented concepts, databases, and applications
A probabilistic learning approach for document indexing
ACM Transactions on Information Systems (TOIS) - Special issue on research and development in information retrieval
Probabilistic retrieval based on staged logistic regression
SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
Some inconsistencies and misidentified modeling assumptions in probabilistic information retrieval
ACM Transactions on Information Systems (TOIS)
“Is this document relevant?…probably”: a survey of probabilistic models in information retrieval
ACM Computing Surveys (CSUR)
An ontology-based expert system for database design
Data & Knowledge Engineering - Special issue on ER '97
Record-boundary discovery in Web documents
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
An Evaluation of Statistical Approaches to Text Categorization
Information Retrieval
Machine learning in automated text categorization
ACM Computing Surveys (CSUR)
Information Retrieval: Uncertainty and Logics: Advanced Models for the Representation and Retrieval of Information
Modern Information Retrieval
Hierarchical Text Categorization Using Neural Networks
Information Retrieval
Recognizing Ontology-Applicable Multiple-Record Web Documents
ER '01 Proceedings of the 20th International Conference on Conceptual Modeling: Conceptual Modeling
Categorisation of web documents using extraction ontologies
International Journal of Metadata, Semantics and Ontologies
Supporting product design by anticipating the success chances of new value profiles
Computers in Industry
Hi-index | 0.00 |
The Web contains a tremendous amount of information. It is challenging to determine which Web documents are relevant to a user query, and even more challenging to rank them according to their degrees of relevance. In this paper, we propose a probabilistic retrieval model using logistic regression for recognizing multiple-record Web documents against an application ontology, a simple conceptual modeling approach. We notice that many Web documents contain a sequence of chunks of textual information, each of which constitutes a “record.” This type of documents is referred to as multiple-record documents. In our categorization approach, a document is represented by a set of term frequencies of index terms, a density heuristic value, and a grouping heuristic value. We first apply the logistic regression analysis on relevant probabilities using the (i) index terms, (ii) density value, and (iii) grouping value of each training document. Hereafter, the relevant probability of each test document is interpolated from the fitting curves. Contrary to other probabilistic retrieval models, our model makes only a weak independent assumption and is capable of handling any important dependent relationships among index terms. In addition, we use logistic regression, instead of linear regression analysis, because the relevance probabilities of training documents are discrete. Using a test set of car-ads and another one for obituary Web documents, our probabilistic model achieves the averaged recall ratio of 100%, precision ratio of 83.3%, and accuracy ratio of 92.5%.