Library of Congress Sits On 170-Billion Tweets Archive (Every Tweet Ever Posted)
The Library of Congress has started collecting an archive of Twitter messages since 2010, and it has now collected roughly 170 billion tweets.
As part of this effort, the U.S. government institution made a deal with Twitter in 2010. Under the terms of the agreement, Twitter is donating a full stream of all public tweets to the Library of Congress. That archive includes 21 billion tweets generated between 2006 and April 2010, as well as 150 billion more tweets posted since then.
"Twitter is a new kind of collection for the Library of Congress but an important one to its mission," wrote the Library in blog post on Friday, Jan. 4. "As society turns to social media as a primary method of communication and creative expression, social media is supplementing, and in some cases supplanting, letters, journals, serial publications, and other sources routinely collected by research libraries."
It remains unclear at this point, however, how the Library of Congress is planning to utilize this ongoing archive. The institution gave no clue of whether it will make that archive public at some point, but it did issue a white paper outlining the project.
"Though the Library has been building and stabilizing the archive and has not yet offered researchers access, we have nevertheless received approximately 400 inquiries from researchers all over the world," explained the Jan. 4 announcement. "Some broad topics of interest expressed by researchers run from patterns in the rise of citizen journalism and elected officials' communications to tracking vaccination rates and predicting stock market activity."
According to the Library of Congress, its two full copies of the entire Twitter archive, i.e. 170 billion tweets, amount to roughly 133 Terabytes of data, and each tweet carries about 50 accompanying metadata fields. The Library has not yet started sorting or filtering that 133TB of Twitter data, which is why it hasn't been able to make this massive archive available to researchers.
In today's tech-driven world, the Internet and social media have gained considerable ground, and in many instances, platforms such as Twitter and Facebook are the fastest way to spread the word. In some ways, such social media services are modern-day versions of journals, log-books, or history books, with activity logs from millions and millions of users.