Show simple item record

dc.contributor.authorAlpert-Abrams, Hannah
dc.contributor.authorGarrette, Dan
dc.date.accessioned2016-05-02T21:39:16Z
dc.date.available2016-05-02T21:39:16Z
dc.date.issued2015-04-10
dc.identifier.urihttp://hdl.handle.net/10106/25662
dc.descriptionPoster Presentationen_US
dc.description.abstractThe PDF images in the Primeros Libros digital collection, an effort to produce digital facsimiles of all books printed before 1601 in the Americas, pose several challenges for Optical Character Recognition (OCR) systems. The Ocular system, designed by Taylor Berg-Kirkpatrick et al., jointly models the physical operation of hand-press printing and the language of the written document, allowing it to ‘learn’ to read early printed books. Ocular cannot, however, handle the orthographic variation and code switching prevalent in the American context. Working with PDF images of trilingual texts in Spanish, Latin, and Nahuatl, we set out to modify Ocular for use on the Primeros libros collection. In this paper, we present our OCR tool for the Primeros Libros collection, an extension of Ocular which can handle multilingual documents, and which includes an interface for the incorporation of orthographic idiosyncrasies. At the same time, we argue for a situated analysis of digitization tools which considers Ocular's statistical models within the context of the Primeros Libros collection. As Walter Mignolo has shown, books from early colonial Mexico embody a larger project of language codification which was deeply embedded in the colonization and religious conversion of New Spain. The mathematical simplicity of Ocular's statistical models suggests a neutral engagement with the text that disguises a deep engagement with these colonial processes. Automatic transcription in this context becomes a process with significant implications for the ideological positioning of digitization projects.en_US
dc.language.isoen_USen_US
dc.subjectOCRen_US
dc.subjectDigital Facsimiliesen_US
dc.subjectOcularen_US
dc.subjectDigital Collectionsen_US
dc.subjectScanningen_US
dc.titleAutomatic Transcription in Colonial Contexts: OCR for the Primeros Librosen_US
dc.typePresentationen_US


Files in this item


Thumbnail

Thumbnail


This item appears in the following Collection(s)

Show simple item record