A sublanguage approach to natural language processing for an expert system
Information Processing and Management: an International Journal
Communications of the ACM
WordNet: a lexical database for English
Communications of the ACM
Communications of the ACM
Corpus-based stemming using cooccurrence of word variants
ACM Transactions on Information Systems (TOIS)
Medical Language Processing: Computer Management of Narrative Data
Medical Language Processing: Computer Management of Narrative Data
Editorial: special issues: "Web retrieval and mining"
Decision Support Systems - Web retrieval and mining
Information Extraction as a Core Language Technology
SCIE '97 International Summer School on Information Extraction: A Multidisciplinary Approach to an Emerging Information Technology
Two biomedical sublanguages: a description based on the theories of Zellig Harris
Journal of Biomedical Informatics - Special issue: Sublanguage
Automatic extraction of facts from press releases to generate news stories
ANLC '92 Proceedings of the third conference on Applied natural language processing
Open information extraction from the web
Communications of the ACM - Surviving the data deluge
Foundations and Trends in Databases
Methods for domain-independent information extraction from the web: an experimental comparison
AAAI'04 Proceedings of the 19th national conference on Artifical intelligence
The role of wordnet in the creation of a trainable message understanding system
AAAI'97/IAAI'97 Proceedings of the fourteenth national conference on artificial intelligence and ninth conference on Innovative applications of artificial intelligence
Hi-index | 0.00 |
CAINES, Content Analysis and INformation Extraction System, employs a semantic based information extraction (IE) methodology through a design science approach to extract unstructured text from the Web. Our system was knowledge-engineered and tested on an active business database by experts who use the database regularly to perform their job functions. We believe that by heavily involving business experts, we are able to advance our thinking about IS research. CAINES extracts information to meet three objectives that were deemed important by our experts: (1) understand what current market conditions impacted the growth of certain balance sheets (2) summarize management's discussion of potential risks and uncertainties (3) identify significant financial activities including mergers, acquisitions, and new business segments. These objectives were developed based on the advice of financial experts who regularly analyze financial reports. A total of 21 online business reports from the EDGAR database, each averaging about 100 pages long, were used in this study. Based on financial expert opinions, extraction rules were created to extract information from financial reports. Using CAINES, one can extract information about global and domestic market conditions, market condition impacts, and information about the business outlook. User testing of CAINES resulted in recall of 85.91%, precision of 87.16%, and an F-measure of 86.46%. Speed with CAINES was also greater than manually extracting information. Users agreed that CAINES quickly and easily extracts unstructured information from financial reports on the EDGAR database. This study highlights the significance of creating a semantic based IE system that addresses practical business issues and solves a true business problem with the knowledge of business experts.