Gutenberg does the Bay: Notes and Thoughts from December 10-15 2003 I was in San Francisco December 10-15 for meetings, presentations and discussion related to my role as head of the Project Gutenberg Literary Archive Foundation. We brought a bunch of free PG DVDs and CDs, and gave away over 2 million eBooks each night. For two public presentations, I gave an overview of Project Gutenberg activities, and an invitation to get involved -- by subscribing to our mailing lists, proofing "a page a day," at Distributed Proofreaders, and helping to spread free CDs and DVDs and otherwise give away free eBooks. Then, Michael Hart (founder of PG) gave an overview of where PG has been and where it's going, on the way from 10,000 eBooks (as of October 15, 2003) to a goal of 1,000,000 eBooks. If PG's since-1990 history of doubling production every 18 months can be maintained (thereby tracking Moore's Law), the million-book goal will be achieved in about a decade. On Wednesday Dec 10 we made a presentation at the Golden Gate Club in the Presidio of San Francisco. It was hosted by the Internet Archive, and included a little wine & cheese. We sent out a press release and invited media, and a few came. Mostly, attendees (45 or 50) were people who are familiar with the Internet Archive's work and had some knowledge of Project Gutenberg. A closing highlight of Wednesday was a presentation by Brewster Kahle, our host at the Internet Archive. He announced an important milestone in the path to identifying the 90% of books published from 1923 through 1964 that are in the public domain in the US, but cannot be identified. The milestone was Distributed Proofreader's completion of the Library of Congress' copyright renewal volumes from 1950 through 1977 (this date range is important -- see the Copyright HOWTO, as well as "Rule 6" at gutenberg.net). Brewster made a generous donation to PGLAF as thanks for the accomplishment. In the near future, PG will work with the IA and others to automate the identification of public domain materials from 1923-1964. Thursday December 11 was a presentation at the Berkeley Public Library. The director of the recently reconditioned library, Jackie Griffin (who said, "I never GOT the Internet until I found Project Gutenberg"), introduced Michael and I, and the talk was similar in content to Wednesday but with a different crowd. Rather than emphasizing the technologies of PG, the Berkeley audience was more interested in the literary and societal implications of eBooks. Both evenings included a lively question and answer session. For the first time, on December 11, I played Joel Erickson's "Project Gutenberg Fanfare," a musical piece he created to honor PG's 10,000th eBook, and dedicated to the public domain. It was an exciting moment. Friday December 12 was a busy day. Michael and I, along with Alev (PG's chief cataloger) visited Google's headquarters in Mountain View. We spent 90 minutes talking about possible ways for PG's content to be more easily usable by Google, and also about some potential partnerships where Google could help PG to create more content. We also got to enjoy Friday's gourmet lunch. Then, we drove to the TechTV studios in South San Francisco. Although there was not a live broadcast that day, we had arranged for a 7-minute taped segment of The Screen Savers for broadcast on Monday December 15. Leo Laporte, the show's host, had talked with Michael earlier in the year for an interview. Leo interviewed both of us, and we got a brief chance to show some of the PG innovations: multimedia eBooks, DP, and some of our new "find an eBook" features. We talked before and after the taping with Leo, also Patrick (co-host), and met Yoshi (super-modification guy). We were impressed with how much they knew about Project Gutenberg, and supported our efforts. Friday evening, we officially started PG's first "capacity building" conference at the IA offices. About a dozen PG volunteers participated. We started with dinner nearby, then talked informally. Saturday, we met from 9:00 am - 5:30 pm (lunch was delivered), and continued our discussions over dinner again. Sunday, we wrapped up by shortly after noon. Michael said, "this is the best conference I've ever been to." There were many great ideas in the conference. There was also rich communication among people who, for the most part, had never met each other face to face. Some of the main things impacting my thinking were: - PG has a shortage of volunteers, especially those willing to take on project leadership. For example, page proofreading at DP is going very well, but many items are not getting completed in a timely fashion because there are not enough people doing post-processing. - To maintain the growth curve, we need a combination of more volunteers, an evolving organizational structure, better technology and more automation. - PG is "state of the art" in eBook production, and is continuing to push the envelope. The forthcoming on-the-fly format conversion from XML to other formats will really increase value to readers. - Our newly revamped catalog is great, but we need to work on getting subject headings into the catalog, and doing better with things like illustrator credits (and whether illustrations are included), also contents listings (for collections and periodicals) and content notes. Some can be done automatically, some will need the attention of human catalogers. - We need to get more help for Web pages, and should roll the pages into a CVS or (minimally) rsync container for easier upkeep contributions. - Overall, support for readers needs to be a lot better. There are lots of ideas and possibilities, and it's likely that other affiliate projects (rather than PG itself) should run some of them. . reader reviews of eBooks . support for communities of interest (slash sites, wiki, Web boards..) . online help & FAQ-o-matic . bug processing . other meta-information about books and authors . areas for display and creation of public domain derivative works - The PG "signature" is to create high-quality eBooks. These eBooks have these characteristics: . nearly all are public domain . they are editable, and suitable for downloading, editing, creating derivative works, etc. . they are carefully proofread and nicely presented . they are easily findable using a variety of search techniques . they may be read online (via a Web browser), or downloaded . they always include a plain text version when feasible, but over 20 other formats are available -- depending on what the eBook's producer wants to offer. More than 1/2 of new eBooks include HTML, as well as text Additionally, part of the "signature" of PG is that it is low-budget and volunteer-driven. Relatively little active collection development or collection policy exists, because instead we work on the principle of letting volunteers choose to work on the literary works that they believe are interesting, worthwhile and important. I can quickly summarize three large tasks before us, just for our next 6 months of effort. Lots of other activities (eBook production, volunteer recruitment, etc.) also need to continue. 1. Continued work on finding aids. Especially subject cataloging and other items beyond author and title. This is wrapped up with our near-term plan to implement a "from copyright clearance through upload" metadata block that will make sure we keep high-quality cataloging information with the book, and are able to fix errors or augment as needed. 2. Conversion on the fly. This will be very useful for readers who want everything from plain text (but with particular margins or other settings) through eBook reader formats, through variations on HTML and PDF output. Other formats include Braille, Mp3 text-to-speech audio, and any other reasonable format for a particular eBook. The main immediate activity will be generation "at birth" of XML markup (using the teixlite DTD) for all eBooks from DP. 3. Reader community tools & support. This has not been big on my radar before, but I now believe it's probably the single most important new initiative for PG. A wide variety of tools, some of which are listed above, exist for readers on other sites. At the Internet Archive, recommender systems track popular items, including new items. Amazon and other sites allow readers to post reviews. Wikipedia lets communities author, edit and revise their own content, and slash code sites feature community-moderated news and discussion. Which of these should be part of the PG Web pages, and which should be on affiliate sites, and which should be encouraged, but separate, remains to be seen. We'll probably start with a few of the easiest things on our own sites, and try to attract reader communities (such as fans of particular authors or genres) to work on some of their own support tools. Overall, I think the Project Gutenberg organization is very strong. Our greatest strengths are our volunteers, our collection, and our technologies. Weaknesses are primarily in our reputation for having eBooks that are difficult to find and use, and that might suffer from poor quality control -- these are weaknesses that our current processes have overcome, but the reputation remains. Opportunities are to grow the collection and volunteer base, but also to reach out to reader communities. The main threat to Project Gutenberg is the possibility of loss of perceived relevance (resulting in loss of volunteers and therefore loss of new eBook production). This could happen if big commerce suddenly creates (and gives away) very large eBook collections, or if any of the many other eBook projects starts to produce items of "Project Gutenberg quality" and eclipses PG, or somehow manages to co-opt or subsume our efforts. Activities peripheral to eBook production and reader community support will remain important for the future. One of the most important of these is PG's efforts to enhance the public domain. This includes the copyright renewal registry mentioned above, but also includes direct action to challenge copyright term extensions and other legislation that has crippled the growth of the public domain. At the conclusion of the conference, participants expressed a willingness and desire to re-convene in the future -- in a different location, and hopefully with people who could not make it this time. As a low-budget, highly distributed volunteer effort, Project Gutenberg has a long track record of success. But like all organisms, PG needs to grow and change in order to continue to live. Thanks to the hard work and thought of everyone who attended, and those who could not but contributed via our email and Web forums, we are now able to work on a road map to our future.