From Early Modern Printing to Post-Modern Indie Publishing: Using eMOP on AFP

Christy, Matthew; Hecker, Jennifer

dc.contributor.author	Christy, Matthew
dc.contributor.author	Hecker, Jennifer
dc.date.accessioned	2016-06-16T21:55:17Z
dc.date.available	2016-06-16T21:55:17Z
dc.date.issued	2015-04-10
dc.identifier.uri	http://hdl.handle.net/10106/25721
dc.description	Twenty-Minute Presentation	en_US
dc.description.abstract	The Early Modern OCR Project (eMOP) is a Mellon Foundation grant funded project whose goal is to improve optical character recognition (OCR) output for early modern printed English texts by utilizing and creating open-source tools and workflows. In addition to establishing an impressive OCR workflow infrastructure, eMOP has produced several open-source post-processing tools to evaluate and improve the text output of Google’s Tesseract OCR engine. Work on eMOP is expected to complete this summer, and the team is now looking to apply its accrued proficiency to other projects. The Austin Fanzine Project (AFP) started as a relatively straightforward digitization and transcription project, but has blossomed into a sandbox for creative experimentation with digital archives and digital humanities methods and tools.To date the project volunteers have digitized fanzines and posted the resulting downloadable files; researched digital archives best practices and crafted project policies; experimented with crowd-sourced transcription and indexing using new open-source software; and explored ways to virtually visualize the connections in real-life communities via maps, e-books, and audio tours. A current focus of the project is demonstrating ways zines can be used in digital scholarship projects to illuminate parts of the culture not documented by mainstream publications, thereby illustrating the value of investing in zine collections. At first blush, these two projects would seem to have little in common. However, many of the challenges faced by eMOP are mirrored in AFP, as are both project’s commitments to innovation, openness and crowd-sourced solutions. The printing process in the hand-press period (roughly 1475-1800), often produced texts with fluctuating baselines, mixed fonts, and varied concentrations of ink, among many other variables. Varying paper quality also commonly led to ink bleedthrough, inconsistent glyph shapes, and other problems. Combining these factors with the poor quality of the images produced via digitization (in Early English Books Online (EEBO) and, to a lesser extent, Eighteenth Century Collections Online (ECCO)), create significant challenges for OCR software attempting to recognize the text content of these images. Similarly, fanzines, often hand-made and featuring irregular layouts, can present the same kinds of problems for OCR engines. The authors were interested in exploring how the tools and workflows created for eMOP could be utilized in the Austin Fanzine Project. They also were eager to see how tools like From the Page could further innovate the workflow on projects like AFP. The authors will present on the challenges faced and lessons learned in modifying tools and workflows designed for early modern print documents to work with hand-made fanzines, and the ways in which different tools can be used to create innovative and collaborative work and research spaces.	en_US
dc.language.iso	en_US	en_US
dc.subject	Early Modern OCR Project (eMOP)	en_US
dc.subject	Optical Character Recognition	en_US
dc.subject	OCR	en_US
dc.subject	Open source tools	en_US
dc.subject	Austin Fanzine Project (AFP)	en_US
dc.subject	Digitization	en_US
dc.subject	Fanzine	en_US
dc.subject	Transcription -- Digital Humanities	en_US
dc.title	From Early Modern Printing to Post-Modern Indie Publishing: Using eMOP on AFP	en_US
dc.type	Presentation	en_US

Files in this item

Name:: Christy_Hecker.jpg
Size:: 489.3Kb
Format:: JPEG image
Description:: JPEG

View/Open

This item appears in the following Collection(s)

TXDHC 2015 Presenter Abstracts

Show simple item record