Interoperability for digital libraries worldwide
Communications of the ACM
Making metadata: a study of metadata creation for a mixed physical-digital collection
Proceedings of the third ACM conference on Digital libraries
Inductive learning algorithms and representations for text categorization
Proceedings of the seventh international conference on Information and knowledge management
Distributional clustering of words for text classification
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Making large-scale support vector machine learning practical
Advances in kernel methods
An introduction to support Vector Machines: and other kernel-based learning methods
An introduction to support Vector Machines: and other kernel-based learning methods
A statistical learning learning model of text classification for support vector machines
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
CITIDEL: making resources available
Proceedings of the 7th annual conference on Innovation and technology in computer science education
SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Automating the Construction of Internet Portals with Machine Learning
Information Retrieval
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
A Comparative Study on Feature Selection in Text Categorization
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Maximum Entropy Markov Models for Information Extraction and Segmentation
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
A maximum entropy approach to information extraction from semi-structured and free text
Eighteenth national conference on Artificial intelligence
eBizSearch: an OAI-compliant digital library for eBusiness
Proceedings of the 3rd ACM/IEEE-CS joint conference on Digital libraries
Federating heterogeneous digital libraries by metadata harvesting
Federating heterogeneous digital libraries by metadata harvesting
A divisive information theoretic feature clustering algorithm for text classification
The Journal of Machine Learning Research
Knowledge-free induction of inflectional morphologies
NAACL '01 Proceedings of the second meeting of the North American Chapter of the Association for Computational Linguistics on Language technologies
Use of support vector learning for chunk identification
ConLL '00 Proceedings of the 2nd workshop on Learning language in logic and the 4th conference on Computational natural language learning - Volume 7
Entity extraction without language-specific resources
COLING-02 proceedings of the 6th conference on Natural language learning - Volume 20
Use of support vector machines in extended named entity recognition
COLING-02 proceedings of the 6th conference on Natural language learning - Volume 20
eBizSearch: an OAI-compliant digital library for eBusiness
Proceedings of the 3rd ACM/IEEE-CS joint conference on Digital libraries
Panorama: extending digital libraries with topical crawlers
Proceedings of the 4th ACM/IEEE-CS joint conference on Digital libraries
Generating fuzzy semantic metadata describing spatial relations from images using the R-histogram
Proceedings of the 4th ACM/IEEE-CS joint conference on Digital libraries
Metaextract: an NLP system to automatically assign metadata
Proceedings of the 4th ACM/IEEE-CS joint conference on Digital libraries
Two supervised learning approaches for name disambiguation in author citations
Proceedings of the 4th ACM/IEEE-CS joint conference on Digital libraries
Automatic extraction of titles from general documents using machine learning
Proceedings of the 5th ACM/IEEE-CS joint conference on Digital libraries
Finding a catalog: generating analytical catalog records from well-structured digital texts
Proceedings of the 5th ACM/IEEE-CS joint conference on Digital libraries
Developing practical automatic metadata assignment and evaluation tools for internet resources
Proceedings of the 5th ACM/IEEE-CS joint conference on Digital libraries
Name disambiguation in author citations using a K-way spectral clustering method
Proceedings of the 5th ACM/IEEE-CS joint conference on Digital libraries
Rule-based word clustering for document metadata extraction
Proceedings of the 2005 ACM symposium on Applied computing
As we may perceive: inferring logical documents from hypertext
Proceedings of the sixteenth ACM conference on Hypertext and hypermedia
VLDB '05 Proceedings of the 31st international conference on Very large data bases
Proceedings of the 3rd international conference on Knowledge capture
A new approach to intranet search based on information extraction
Proceedings of the 14th ACM international conference on Information and knowledge management
Automatic categorization of figures in scientific documents
Proceedings of the 6th ACM/IEEE-CS joint conference on Digital libraries
Information extraction from research papers using conditional random fields
Information Processing and Management: an International Journal
Automatic extraction of titles from general documents using machine learning
Information Processing and Management: an International Journal
Efficient optimization of support vector machine learning parameters for unbalanced datasets
Journal of Computational and Applied Mathematics
Reference metadata extraction using a hierarchical knowledge representation framework
Decision Support Systems
Web page title extraction and its application
Information Processing and Management: an International Journal
Integrating data and text mining processes for digital library applications
Proceedings of the 7th ACM/IEEE-CS joint conference on Digital libraries
FLUX-CIM: flexible unsupervised extraction of citation metadata
Proceedings of the 7th ACM/IEEE-CS joint conference on Digital libraries
Towards automatic conceptual personalization tools
Proceedings of the 7th ACM/IEEE-CS joint conference on Digital libraries
Collaboration over time: characterizing and modeling network evolution
WSDM '08 Proceedings of the 2008 International Conference on Web Search and Data Mining
International Journal of Metadata, Semantics and Ontologies
Enabling ontology-based document classification and management in ebXML registries
Proceedings of the 2008 ACM symposium on Applied computing
A metadata generation system for scanned scientific volumes
Proceedings of the 8th ACM/IEEE-CS joint conference on Digital libraries
Using Data Mining Methods to Predict Personally Identifiable Information in Emails
ADMA '08 Proceedings of the 4th international conference on Advanced Data Mining and Applications
Private Data Discovery for Privacy Compliance in Collaborative Environments
CDVE '08 Proceedings of the 5th international conference on Cooperative Design, Visualization, and Engineering
Automatic Extraction of Pedagogic Metadata from Learning Content
International Journal of Artificial Intelligence in Education
Automatic metadata generation using associative networks
ACM Transactions on Information Systems (TOIS)
Automatic metadata extraction from museum specimen labels
DCMI '08 Proceedings of the 2008 International Conference on Dublin Core and Metadata Applications
CEBBIP: a parser of bibliographic information in chinese electronic books
Proceedings of the 9th ACM/IEEE-CS joint conference on Digital libraries
Paper Annotation with Learner Models
Proceedings of the 2005 conference on Artificial Intelligence in Education: Supporting Learning through Intelligent and Socially Informed Technology
A General Learning Method for Automatic Title Extraction from HTML Pages
MLDM '09 Proceedings of the 6th International Conference on Machine Learning and Data Mining in Pattern Recognition
Automated document metadata extraction
Journal of Information Science
Bridging the Gap between Linked Data and the Semantic Desktop
ISWC '09 Proceedings of the 8th International Semantic Web Conference
Automated template-based metadata extraction architecture
ICADL'07 Proceedings of the 10th international conference on Asian digital libraries: looking back 10 years and forging new frontiers
Using automatic metadata extraction to build a structured syllabus repository
ICADL'07 Proceedings of the 10th international conference on Asian digital libraries: looking back 10 years and forging new frontiers
Searching for ground truth: a stepping stone in automating genre classification
DELOS'07 Proceedings of the 1st international conference on Digital libraries: research and development
oreChem ChemXSeer: a semantic digital library for chemistry
Proceedings of the 10th annual joint conference on Digital libraries
WebApps'10 Proceedings of the 2010 USENIX conference on Web application development
Louhi '10 Proceedings of the NAACL HLT 2010 Second Louhi Workshop on Text and Data Mining of Health Documents
ICADL'10 Proceedings of the role of digital libraries in a time of global change, and 12th international conference on Asia-Pacific digital libraries
SciPlore Xtract: extracting titles from scientific PDF documents by analyzing style information
ECDL'10 Proceedings of the 14th European conference on Research and advanced technology for digital libraries
Automatic mining of cognitive metadata using fuzzy inference
Proceedings of the 22nd ACM conference on Hypertext and hypermedia
On identifying academic homepages for digital libraries
Proceedings of the 11th annual international ACM/IEEE joint conference on Digital libraries
PDFMeat: managing publications on the semantic desktop
Proceedings of the 20th ACM international conference on Information and knowledge management
Efficient name disambiguation for large-scale databases
PKDD'06 Proceedings of the 10th European conference on Principle and Practice of Knowledge Discovery in Databases
Semantic search in the World News domain using automatically extracted metadata files
Knowledge-Based Systems
Genre classification in automated ingest and appraisal metadata
ECDL'06 Proceedings of the 10th European conference on Research and Advanced Technology for Digital Libraries
Towards next generation citeseer: a flexible architecture for digital library deployment
ECDL'06 Proceedings of the 10th European conference on Research and Advanced Technology for Digital Libraries
iASA: learning to annotate the semantic web
Journal on Data Semantics IV
Header metadata extraction from semi-structured documents using template matching
OTM'06 Proceedings of the 2006 international conference on On the Move to Meaningful Internet Systems: AWeSOMe, CAMS, COMINF, IS, KSinBIT, MIOS-CIAO, MONET - Volume Part II
Automatic metadata mining from multilingual enterprise content
Web Semantics: Science, Services and Agents on the World Wide Web
Data mining with parallel support vector machines for classification
ADVIS'06 Proceedings of the 4th international conference on Advances in Information Systems
ICWL'07 Proceedings of the 6th international conference on Advances in web based learning
Digital Preservation in Grids and Clouds: A Middleware Approach
Journal of Grid Computing
Building a document genre corpus: a profile of the KRYS I corpus
IRSG'08 Proceedings of the 2008 BCS-IRSG conference on Corpus Profiling
Proceedings of the 12th ACM/IEEE-CS joint conference on Digital Libraries
Proceedings of the 12th ACM/IEEE-CS joint conference on Digital Libraries
Web-based citation parsing, correction and augmentation
Proceedings of the 12th ACM/IEEE-CS joint conference on Digital Libraries
A comparison of metadata extraction techniques for crowdsourced bibliographic metadata management
Proceedings of the 27th Annual ACM Symposium on Applied Computing
A comparison of layout based bibliographic metadata extraction techniques
Proceedings of the 2nd International Conference on Web Intelligence, Mining and Semantics
Self-supervised learning approach for extracting citation information on the web
APWeb'12 Proceedings of the 14th Asia-Pacific international conference on Web Technologies and Applications
Content independent metadata production as a machine learning problem
MLDM'12 Proceedings of the 8th international conference on Machine Learning and Data Mining in Pattern Recognition
Assessing quality dynamics in unsupervised metadata extraction for digital libraries
ECDL'07 Proceedings of the 11th European conference on Research and Advanced Technology for Digital Libraries
Logical Structure Recovery in Scholarly Articles with Rich Document Features
International Journal of Digital Library Systems
Extracting and matching authors and affiliations in scholarly documents
Proceedings of the 13th ACM/IEEE-CS joint conference on Digital libraries
Docear's PDF inspector: title extraction from PDF files
Proceedings of the 13th ACM/IEEE-CS joint conference on Digital libraries
Searching online book documents and analyzing book citations
Proceedings of the 2013 ACM symposium on Document engineering
Hi-index | 0.00 |
Automatic metadata generation provides scalability and usability for digital libraries and their collections. Machine learning methods offer robust and adaptable automatic metadata extraction. We describe a Support Vector Machine classification-based method for metadata extraction from header part of research papers and show that it outperforms other machine learning methods on the same task. The method first classifies each line of the header into one or more of 15 classes. An iterative convergence procedure is then used to improve the line classification by using the predicted class labels of its neighbor lines in the previous round. Further metadata extraction is done by seeking the best chunk boundaries of each line. We found that discovery and use of the structural patterns of the data and domain based word clustering can improve the metadata extraction performance. An appropriate feature normalization also greatly improves the classification performance. Our metadata extraction method was originally designed to improve the metadata extraction quality of the digital libraries Citeseer [17] and EbizSearch[24]. We believe it can be generalized to other digital libraries.