ProQuest Joins Forces with TAMU Scholars to Make 15th Century Books Behave Like Born-Digital Text
Collaborative project will train OCR technology to read early modern fonts
Information powerhouse ProQuest is participating in a project that will vastly accelerate research of 15th through 17th Century cultural history. The company will provide access to page images from the veritable Early English Books Online and newcomer Early European Books to the Early Modern OCR Project (eMOP) at Texas A&M. EMOP will use the content to create a database of typefaces used in the early modern era, train OCR software to read them and then apply crowd-sourcing for editing. The project will turn the rich corpus of works from this pivotal historical period into fully searchable digital documents.
“Digitization of the historical archives of the early modern era made this literature far more accessible. Page images provide scholars with unprecedented access to books that previously could have only been viewed in their source library. However, precision search — the ability to use technology to zero in on very specific text — has been hampered by the fact that OCR technology can’t read the peculiarities of early printing,” said Mary Sauer-Games, ProQuest vice-president, publishing. “We’re thrilled to participate in an effort that we feel will drive new levels of historical discovery. We love the application of modern ingenuity to turn these very old archives into works that are as searchable as text that was born digital.”
ProQuest has played a key worldwide role in preservation and access to early modern history, ensuring the survival of printed works from as early as 1450. In the 1930s, the company became a pioneer of microfiche, when it filmed the contents of the vast archives of the British Library and other major libraries across England – virtually every English language book printed in the 15th, 16th and 17th centuries. The microfilm collection, ProQuest’s flagship Early English Books, opened these works to global study and created an avenue for preservation. It has since become the quintessential collection for study of the early modern era.
In the 1990s, ProQuest began a massive effort to capture the collection digitally. Early English Books Online enables scholars to manage, share and collaborate on their research virtually. The company even created a social network that allows the scholars who use the collection as a base for their research to connect with each other.
Then, early in the 21st century, ProQuest expanded the program to include major European libraries, launching Early European Books with the Danish Royal Library in Copenhagen and the Biblioteca Nazionale Centrale di Firenze in Italy. Digitization projects are also underway with the U.K.’s famed scientific and medical library — The Wellcome – and the National Library of the Netherlands.
eMop is led by Texas A&M Professors Laura Mandell, Director of the Initiative for Digital Humanities, Media, and Culture (IDHMC), Ricardo Gutierrez-Osuna of Computer Science, and Richard Furuta, Director of the Center for the Study of Digital Libraries (CSDL), along with Anton DuPlessis and Todd Samuelson, book historians from Cushing Rare Books Library. The scholars earned a two-year, $734,000 development grant from the Andrew W. Mellon Foundation to support the work. ProQuest is one of a variety of participating publishers and software organizations that are collaborating on the project.
ProQuest connects people with vetted, reliable information. Key to serious research, the company has forged a 70-year reputation as a gateway to the world’s knowledge – from dissertations to governmental and cultural archives to news, in all its forms. Its role is essential to libraries and other organizations whose missions depend on the management and delivery of complete, trustworthy information.
ProQuest’s massive information pool is made accessible in research environments that accelerate productivity, empowering users to discover, create, and share knowledge.
An energetic, fast-growing organization, ProQuest includes the ProQuest®, Bowker®, Dialog®, ebrary®, and Serials Solutions® businesses and notable research tools such as the RefWorks® and Pivot™ services, as well as its Summon® web-scale discovery service. The company is headquartered in Ann Arbor, Michigan, with offices around the world.