In the Digital Era, Publication Isn’t Preservation
When a print book is published its metadata is literally attached to its content. The author and title, publisher and imprint, price, ISBN and barcode, as well as the size, the shape, the binding are clear, easily referenced at a glance. Part of the complicated process of digitizing books that the Hathi Trust, the Internet Archive, and Google, for example, faced was how to record and connect all of this information to a digital file. For scanned and even more seriously for born-digital e-books, as Digital Book World pointed out earlier this month (Brian O’Leary clearly agrees), the matter is more complex.
First of all, it’s not always readily obvious that the metadata is correct. But just as important, the connection of metadata to digital book content is more tenuous. Without a hard copy to refer back to, a piece of information that goes missing may not be retrievable. For this reason (and more), the Internet Archive has been keeping physical copies of all the books it scans that would otherwise be discarded. And all books it can get its hands on.
Digital archivists have a lot to say on this matter, as they have been dealing with the instability of digital files (not to mention evolving file formats, software, storage and hardware issues) for years. And they are probably the first to point out something that may seem counter-intuitive: Digitization and cloud storage do not necessarily ensure longevity. Steeped as we may be in the fear that digital files will exist forever — those Friendster/Myspace/Facebook/Twitter images of ourselves as teenagers haunting us all our lives — anyone whose computer has crashed, taking photos and word documents with it, knows this probably won’t be the case.
In response to the perceived (read: false) assumption that in the digital age we are over-archiving our lives, “anti-archival” seems to be garnering support. In fact, we aren’t keeping copies of e-books themselves or tracking their technical evolution and divergence in an acknowledged systematic way.*
Catherine C. Marshall, a Principal Researcher in Microsoft Research‘s Silicon Valley Lab, is one of the most outspoken scholars on the subject of digital archiving, especially in regards to the complexities of personal digital identities, not to mention digital archives thereof. “It might be just as traumatic if everything survives the passage of time intact as it would be if nothing does,” she points out in her 2011 essay “Challenges and Opportunities in Personal Digital Archiving” in I, Digital: Personal Collections in the Digital Era.
The codex – the “print” book as we know it – was a major technological innovation for many reasons, and it has lasted so long as a delivery technology (copies, too, have lasted so long) because it is so stable. Paper is strong and between two covers pages packed tight together are very safe. And we put books in libraries, public ones and personal ones, where they are further protected.**
Which brings me, finally, to my point: Digital publication doesn’t just look different from print publication, it is by nature less stable. The machines, operating systems, and digital rights management software we use to access e-books are constantly evolving, and so are the things we can do to them, how they were written and edited and encoded, what they hold in terms of media. Just like everything else, all that is digital eventually falls apart. And terms like “bit rot” and “link rot” may not resonate but they should.
If subscription models continue to take off, if libraries continue not to be able to buy permanent access to e-books for their patrons outright, if even the Library of Congress and archival repositories are unable to accession original e-books and save copies for posterity, we could lose something of great value. Imagine if nobody had saved original editions of the Gutenberg Bible or A Canterbury Tale or Don Quixote; how much less would we know about our literary heritage?
*Outside of frequently-challenged (and illegal) repositories of pirated e-books like The Pirate Bay and now-closed Library.nu, where metadata and accuracy are generally accepted to be unreliable.
**On the value of keeping lots of copies of books, see Library, An Unquiet History, by former librarian and current Digital Projects Producer and Editor at the metaLab at Harvard, Matthew Battles.
Tablet reading concept via Shutterstock