The Twitter archive, which was “born digital,” as archivists say, will be easily searchable by machine — unlike family letters and diaries gathering dust in attics.
As a written record, Tweets are very close to the originating thoughts. “Most of our sources are written after the fact, mediated by memory — sometimes false memory,” Ms. Taylor said. “And newspapers are mediated by editors. Tweets take you right into the moment in a way that no other sources do. That’s what is so exciting.”
Twitter messages preserve witness accounts of an extraordinary variety of events all over the planet. “In the past, some people were able on site to write about, or sketch, as a witness to an event like the hanging of John Brown,” said William G. Thomas III, a professor of history at the University of Nebraska-Lincoln. “But that’s a very rare, exceptional historical record.”
Ten billion Twitter messages take up little storage space: about five terabytes of data. (A two-terabyte hard drive can be found for less than $150.) And Twitter says the archive will be a bit smaller when it is sent to the library. Before transferring it, the company will remove the messages of users who opted to designate their account “protected,” so that only people who obtain their explicit permission can follow them.
A Twitter user can also elect to use a pseudonym and not share any personally identifying information. Twitter does not add identity tags that match its users to real people.
Each message is accompanied by some tidbits of supplemental information, like the number of followers that the author had at the time and how many users the author was following. While Mr. Cohen said it would be useful for a historian to know who the followers and the followed are, this information is not included in the Tweet itself.
But there’s nothing private about who follows whom among users of Twitter’s unprotected, public accounts. This information is displayed both at Twitter’s own site and in applications developed by third parties whom Twitter welcomes to tap its database.
Even though public Tweets were always intended for everyone’s eyes, the Library of Congress is skittish about stepping anywhere in the vicinity of a controversy. Martha Anderson, director of the National Digital Information Infrastructure and Preservation Program at the library, said, “There’s concern about privacy issues in the near term and we’re sensitive to these concerns.”
The library will embargo messages for six months after their original transmission. If that is not enough to put privacy issues to rest, she said, “We may have to filter certain things or wait longer to make them available.” The library plans to dole out its access to its Twitter archive only to those whom Ms. Anderson called “qualified researchers.”
BUT the library’ s restrictions on access will not matter. Mr. Macgillivray at Twitter said his company would be turning over copies of its public archive to Google, Yahoo and Microsoft, too. These companies already receive instantaneously the stream of current Twitter messages. When the archive of older Tweets is added to their data storehouses, they will have a complete, constantly updated, set, and users won’t encounter a six-month embargo.
Google already offers its users Replay, the option of restricting a keyword search only to Tweets and to particular periods. It’s quickly reached from a search results page. (Click on “Show options,” then “Updates,” then a particular place on the timeline.)
A tool like Google Replay is helpful in focusing on one topic. But it displays only 10 Tweets at a time. To browse 10 billion — let’s see, figuring six seconds for a quick scan of each screen — would require about 190 sleepless years.
Mr. Cohen encourages historians to find new tools and methods for mining the “staggeringly large historical record” of Tweets. This will require a different approach, he said, one that lets go of straightforward “anecdotal history.”
In the end, perhaps quality will emerge from sheer quantity.
Randall Stross is an author based in Silicon Valley and a professor of business at San Jose State University. E-mail: firstname.lastname@example.org.