Creating a New and Enhanced Content and Metadata Extraction System for CiteSeerX
- Author:
- Killian, Jason
- Published:
- [University Park, Pennsylvania] : Pennsylvania State University, 2015.
- Physical Description:
- 1 electronic document
- Additional Creators:
- Giles, C. Lee
Schreyer Honors College - Access Online:
- honors.libraries.psu.edu
- Restrictions on Access:
- Open Access.
- Summary:
- One challenge in the world of digital libraries is obtaining accurate data and metadata from sources where this information is not easily digitally accessible. In the case of CiteSeerX, a digital library search engine of scholarly documents, this challenge manifests itself in extracting metadata and content from scholarly PDF files. CiteSeerX's current extraction system responsible for this is functional but could be improved in many areas. This thesis will present the architecture and design of a new extraction system that fixes the issues of CiteSeerX's current extraction system.
- Genre(s):
- Dissertation Note:
- B.S. Pennsylvania State University 2015.
- Technical Details:
- The full text of the dissertation is available as an Adobe Acrobat .pdf file ; Adobe Acrobat Reader required to view the file.
View MARC record | catkey: 15439975