Digital libraries and knowledge disaggregation: the use of journal article components
Proceedings of the third ACM conference on Digital libraries
CiteSeer: an automatic citation indexing system
Proceedings of the third ACM conference on Digital libraries
Who can claim complete abstinence from peeking at print jobs?
CSCW '02 Proceedings of the 2002 ACM conference on Computer supported cooperative work
A document corpus browser for in-depth reading
Proceedings of the 4th ACM/IEEE-CS joint conference on Digital libraries
Generating fuzzy semantic metadata describing spatial relations from images using the R-histogram
Proceedings of the 4th ACM/IEEE-CS joint conference on Digital libraries
Automatic extraction of titles from general documents using machine learning
Proceedings of the 5th ACM/IEEE-CS joint conference on Digital libraries
Finding a catalog: generating analytical catalog records from well-structured digital texts
Proceedings of the 5th ACM/IEEE-CS joint conference on Digital libraries
Automatic extraction of titles from general documents using machine learning
Information Processing and Management: an International Journal
Web page title extraction and its application
Information Processing and Management: an International Journal
A metadata generation system for scanned scientific volumes
Proceedings of the 8th ACM/IEEE-CS joint conference on Digital libraries
Extracting the author of web pages
Proceedings of the 2nd ACM workshop on Information credibility on the web
Automatic metadata generation using associative networks
ACM Transactions on Information Systems (TOIS)
Automatic metadata generation applications: a survey study
International Journal of Metadata, Semantics and Ontologies
Automatically generating high quality metadata by analyzing the document code of common file types
Proceedings of the 9th ACM/IEEE-CS joint conference on Digital libraries
Automated document metadata extraction
Journal of Information Science
Identifying Information Sender Configuration of Web Pages
WI-IAT '09 Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology - Volume 01
Bridging the Gap between Linked Data and the Semantic Desktop
ISWC '09 Proceedings of the 8th International Semantic Web Conference
Searching for ground truth: a stepping stone in automating genre classification
DELOS'07 Proceedings of the 1st international conference on Digital libraries: research and development
Evidence-based information extraction for high accuracy citation and author name identification
Large Scale Semantic Access to Content (Text, Image, Video, and Sound)
Genre classification in automated ingest and appraisal metadata
ECDL'06 Proceedings of the 10th European conference on Research and Advanced Technology for Digital Libraries
Semantic scoring based on small-world phenomenon for feature selection in text mining
ADMA'06 Proceedings of the Second international conference on Advanced Data Mining and Applications
Header metadata extraction from semi-structured documents using template matching
OTM'06 Proceedings of the 2006 international conference on On the Move to Meaningful Internet Systems: AWeSOMe, CAMS, COMINF, IS, KSinBIT, MIOS-CIAO, MONET - Volume Part II
Semantic metadata models in references sharing and retrieval system semrex
GPC'06 Proceedings of the First international conference on Advances in Grid and Pervasive Computing
Building a document genre corpus: a profile of the KRYS I corpus
IRSG'08 Proceedings of the 2008 BCS-IRSG conference on Corpus Profiling
Content independent metadata production as a machine learning problem
MLDM'12 Proceedings of the 8th international conference on Machine Learning and Data Mining in Pattern Recognition
Automatic generation of SCORM compliant metadata for portable document format files
Proceedings of the 13th International Conference on Computer Systems and Technologies
Determining the titles of Web pages using anchor text and link analysis
Expert Systems with Applications: An International Journal
Hi-index | 0.00 |
The automatic document metadata extraction process is animportant task in a world where thousands of documents are just one``click'' away. Thus, powerful indices are necessary to support effective retrieval. The upcoming XML standard represents an important step in this direction as itssemistructuredrepresentation conveys document metadata together with the text of the document. For example, retrieval of scientific papers by authors or affiliations would be a straightforward tasks if papers were stored in XML.Unfortunately, today, the largest majority of documents on the web are available in forms that do not carryadditional semantics. Converting existing documents to a semistructured representation is time consuming and no automatic process can be easily applied. In this paper we discuss a system, based on a novel spatial/visualknowledge principle, for extracting metadata from scientific papers storedas PostScript files. Our system embeds the general knowledge about the graphical layout of a scientific paper to guide the metadata extraction process. Our system can effectively assist the automatic index creation for digital libraries.