economics

May 3, 2010

When History Is Compiled 140 Characters at a Time

Filed under: Uncategorized — ktetaichinh @ 4:08 am
Tags: , ,

The Twitter archive, which was “born digital,” as archivists say, will be easily searchable by machine — unlike family letters and diaries gathering dust in attics.

As a written record, Tweets are very close to the originating thoughts. “Most of our sources are written after the fact, mediated by memory — sometimes false memory,” Ms. Taylor said. “And newspapers are mediated by editors. Tweets take you right into the moment in a way that no other sources do. That’s what is so exciting.”

Twitter messages preserve witness accounts of an extraordinary variety of events all over the planet. “In the past, some people were able on site to write about, or sketch, as a witness to an event like the hanging of John Brown,” said William G. Thomas III, a professor of history at the University of Nebraska-Lincoln. “But that’s a very rare, exceptional historical record.”

Ten billion Twitter messages take up little storage space: about five terabytes of data. (A two-terabyte hard drive can be found for less than $150.) And Twitter says the archive will be a bit smaller when it is sent to the library. Before transferring it, the company will remove the messages of users who opted to designate their account “protected,” so that only people who obtain their explicit permission can follow them.

A Twitter user can also elect to use a pseudonym and not share any personally identifying information. Twitter does not add identity tags that match its users to real people.

Each message is accompanied by some tidbits of supplemental information, like the number of followers that the author had at the time and how many users the author was following. While Mr. Cohen said it would be useful for a historian to know who the followers and the followed are, this information is not included in the Tweet itself.

But there’s nothing private about who follows whom among users of Twitter’s unprotected, public accounts. This information is displayed both at Twitter’s own site and in applications developed by third parties whom Twitter welcomes to tap its database.

Alexander Macgillivray, Twitter’s general counsel, said, “From the beginning, Twitter has been a public and open service.” Twitter’s privacy policy states: “Our services are primarily designed to help you share information with the world. Most of the information you provide to us is information you are asking us to make public.”

Mr. Macgillivray added, “That’s why, when we were revising our privacy policy, we toyed with the idea of calling it our ‘public policy.’ ” He said the company would have done so but California law required that it have a “privacy policy” labeled as such.

Even though public Tweets were always intended for everyone’s eyes, the Library of Congress is skittish about stepping anywhere in the vicinity of a controversy. Martha Anderson, director of the National Digital Information Infrastructure and Preservation Program at the library, said, “There’s concern about privacy issues in the near term and we’re sensitive to these concerns.”

The library will embargo messages for six months after their original transmission. If that is not enough to put privacy issues to rest, she said, “We may have to filter certain things or wait longer to make them available.” The library plans to dole out its access to its Twitter archive only to those whom Ms. Anderson called “qualified researchers.”

BUT the library’ s restrictions on access will not matter. Mr. Macgillivray at Twitter said his company would be turning over copies of its public archive to Google, Yahoo and Microsoft, too. These companies already receive instantaneously the stream of current Twitter messages. When the archive of older Tweets is added to their data storehouses, they will have a complete, constantly updated, set, and users won’t encounter a six-month embargo.

Google already offers its users Replay, the option of restricting a keyword search only to Tweets and to particular periods. It’s quickly reached from a search results page. (Click on “Show options,” then “Updates,” then a particular place on the timeline.)

A tool like Google Replay is helpful in focusing on one topic. But it displays only 10 Tweets at a time. To browse 10 billion — let’s see, figuring six seconds for a quick scan of each screen — would require about 190 sleepless years.

Mr. Cohen encourages historians to find new tools and methods for mining the “staggeringly large historical record” of Tweets. This will require a different approach, he said, one that lets go of straightforward “anecdotal history.”

In the end, perhaps quality will emerge from sheer quantity.

Randall Stross is an author based in Silicon Valley and a professor of business at San Jose State University. E-mail: stross@nytimes.com.

Leave a Comment »

No comments yet.

RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Blog at WordPress.com.

%d bloggers like this: