The New Libraries of Alexandria

The world of digital book preservation eases the burden of literary scholars and historians while serving to make historically important literature widely available. In the past, scholars had to rely on one or two fragmented manuscripts that likely had inconsistencies. The Library of Alexandria burning down was a big hit to literary history because extra copies were tedious and expensive to produce. In the age of the printing press and mass-produced books, paper decays. Organizations, such as Project Gutenberg, must take special precautions to preserve ancient paper. Your average paperback will probably be printed with short term profit in mind. After all, the smell of old books exists due to the breakdown of chemicals within the paper itself.

Plenty of books are worth preserving, even in a world where cheap, by-the-numbers romance novels dominate every grocery stores’ shelves. Knowing just which books will be important has not been determined yet. Books are snapshots of time, and as such, are necessary for a more complete view of history. Digital preservation seeks to make as many books as possible available digitally to anyone who wants to read them.

Project Gutenberg

The most successful group has been Project Gutenberg. PG is at the forefront of digital preservation. Rather than the literal printing press that its name refers to, this press makes strides in digital preservation. They have been “the original, and oldest, etext project on the Internet, founded in 1971.” Michael Hart had, essentially, indefinite access to a mainframe computer at the University of Illinois and a simple premise: “anything that can be entered into a computer can be reproduced indefinitely.” He came to the conclusion, even in the 70’s, that something recorded digitally could be reproduced in any number of copies.

The internet is a dream come true for Hart’s idea of spreading the books around as far as possible. As PG developed, a philosophical system has also developed. The first aspect of this philosophy is “The Project Gutenberg Etexts should cost so little that no one will really care how much they cost. They should be a general size that fits on the standard media of the time.” As such, the reader has the simplest access as possible.

The texts are transcribed in American Standard Code for Information Interchange, or ASCII, which is the simplest transcription of text possible: “Plain Vanilla ASCII can be read, written, copied and printed by just about every simple text editor on every computer in the world.” Basically, any written book can be done in this format and translated from there. A glance through the Gutenberg Press catalog shows that you can download epub, Kindle, plain text, or just read it online in HTML. The texts are easily put into any of these formats from the initial input.

The second part of the philosophy states that “The Project Gutenberg Etexts should be so easily used that no one should ever have to care about how to use, read, quote and search them.” The simple ASCII foundation also lends itself well to being discovered from a simple search. Their website is simple to use and lends itself well to this quote from their philosophy page:

We love it when we hear about kids or grandparents taking each other to an etexts to Peter Pan when they come back from watching HOOK at the movies, or when they read Alice in Wonderland after seeing it on TV. We have also been told that nearly every Star Trek movie has quoted current Project Gutenberg etext releases (from Moby Dick in The Wrath of Khan; a Peter Pan quote finishing up the most recent, etc.) not to mention a reference to Through the Looking-Glass in JFK.

The point of searchability is that you can look for phrases you’ve heard in conversations, quotes you saw at the beginning of movies, and the names of authors you are interested in.

The management of Project Gutenberg is what makes this possible. PG is a non-profit organization and is run by Dr. Gregory B. Newby, volunteer CEO. The books are all submitted to the Project by volunteers, as well. They do not have to worry about maintaining a staff.

PG only publishes what is in the public domain. So, as soon as something enters the domain and a volunteer shows interest in the book, PG can enter a submission. PG, as such, avoids any potential legal issues that come with the tricky world of copying works of literature. From the volunteer force to steering clear of lawsuits, everything about Project Gutenberg is designed for the purpose of digital preservation and the dispersion of texts.

Google Books

Issues arise when you overlook copyright laws. Google Books is an excellent example. The idea is great: They wanted to physically scan books and make them searchable on Google. As their own website states:

…in a future world in which vast collections of books are digitized, people would use a ‘web crawler’ to index the books’ content and analyze the connections between them, determining any given book’s relevance and usefulness by tracking the number and quality of citations from other books.

Google partnered with Harvard, the University of Michigan, the New York Public Library, Oxford, and Stanford. However, Google sticks strictly to copyright-unprotected works. They encountered legal trouble which boils down to copyright law: “Plaintiffs, the Authors Guild, Inc. and individual copyright owners, complained that Google scanned more than twenty-million books without permission or payment of license fees.” The Authors Guild accused Google of doing the equivalent of walking into a library, just scanning everything, and then putting it on the internet.

After a decade, Google won, but “the company all but shut down its scanning operation.” Their books are available to rent or buy, and are searchable, but is a fractured database that has not been updated in recent years. However, the operation has been put to some use, though: “Through the HathiTrust Research Center, scholars can tap into the Google Books corpus and conduct computational analysis—looking for patterns in large amounts of text, for instance—without breaching copyright.” The project was ambitious and still has benefits today.

Million Books Project

The Million Book Project straddles the line between independent, volunteer projects and ambitious, big-tech business. They were a nonprofit organization that scanned physical copies of books. As their objective states, “The objective of this project is to create a free-to-read, searchable collection of one million books, primarily in the English language, available to everyone over the Internet.” Similar to Project Gutenberg, they simply sought to preserve books, along with backing from universities in China, India, and even Egypt.

However, the Million Book Project was expensive, and ultimately ended. As of January 2008, their website said they anticipated over ten million books in the next ten years. The project ended that same year. In addition to those universities, the website states, “National Science Foundation provided funding for Scanners, Computers, Servers, and Software.” A necessary web of manpower is required to pay for and run all of these machines. Million Books had a Rube Goldberg machine of sponsors and staff that was bound to break. Most of the texts could only be accessed through Internet Archive’s efforts.

Project Gutenberg’s administrative structure is simple. Anyone with a computer can volunteer to type up books for Project Gutenberg. Google Books was ambitious, and ultimately tripped over the red tape it so often attempted to hop over. The Million Books Project is a phenomenal idea, as well, but had so many moving parts that something was bound to breakdown.

Each of these pioneers in the world of digital publishing made important strides and learned lessons that we can take with us into the future. Digital publishing allows unparalleled access to the world of literature preservation available to anyone who thinks to search Google for “What is the Library of Alexandria?”

Published

2019-10-11

WDRalph in Technology | 2019-10-11

Published

2019-10-11