The Disappearing Web: How we're losing the battle to preserve the Internet
The Web may be less permanent than we once thought. According to archivists, after two years, 27 percent of social media, pictures, video, and blog posts vanish. For many who regret oversharing, this may be welcome news. But for historians eager to document the tweets that inspired the Arab Spring or who want a snapshot of how the Web looked on September 9, 2001, the impermanence of the Internet presents a challenge.
A recent report by Old Dominion (Virginia) University researchers Michael Nelson and Hany SalahEldeen found that after one year, about 11 percent of resources shared online are lost and that they continue to disappear at a rate of 0.02 percent per day.
The two-year study examined Web archiving and the longevity of content posted to the Internet. It found that reference pages such as blog posts and tweets tend to stick around, while the primary sources that those pointers link to, such as videos, photos, and first-person documentation of significant events, quickly disappear.
People generally don't care about losing pictures of your cats, Nelson says. But he points to the Egyptian revolution and the role that social media played in shaping that movement as being of considerable long-term interest.
"We need to hold on to these resources, but we lack the infrastructure to hold on to these things with any level of longevity,” Nelson says.
Saving Internet content from the digital dustbin
Today many efforts are underway to preserve the Web. Founded in 1996, Internet Archive is one of the most notable of these endeavors, with 160 billion archived Web pages searchable by the public. The site recently introduced a TV news broadcast archive collection as well. Other services, such as Topsy—which claims one of the largest archives of searchable tweets dating back to 2008, two years after the founding of Twitter—specialize in helping you explore Twitter archives.
But the Old Dominion University researchers question whether tweets stripped of context, links, and details such as whether something was retweeted hold much of their historic value.
“Some of the Egyptian tweets said, ‘I can’t believe this, oh my goodness,’ and then [included] a link to a picture,” Nelson says. “You have no idea what that person was reacting to at the time. If this is scaled up for all these tweets, then the loss becomes significant.”
What we have already lost
Unfortunately, preserving our digital legacy is complicated. What is worth preserving? How do we determine what is historically relevant? Do we save too much?
Internet Archive founder Brewster Kahle says the Internet is “the publishing medium of our time.” Losing important early Internet databases such as Usenet, where thousands of newsgroups still exist today, would be the equivalent of losing great works of fiction from past generations.
The now defunct Deja News collected old Usenet discussions. Google purchased Deja News in 2001 and folded into Google Groups. The Google Groups archive of Usenet discussions dates back to 1981, with more than 800 million messages that document everything from the first mention of the CD player to Tim Berners-Lee's postings on his World Wide Web project. The archive is searchable, but accessing it isn't the most intuitive process.
“We don’t necessarily need to record every cat video,” Kahle says, “But there are works that do make sense to not only preserve, but keep referenceable.”
The challenge lies in curating a collection so vast, retaining context while filtering out the fluff from history as it unfolds.
“We just try to collect it all,” Kahle says.
The CompuServe bulletin boards of the late 1980s and early 1990s are gone, Kahle says, along with culturally and historically significant discussions about early technology.
There was no automatic archiving for shuttered services such as Kodak Gallery, AOL Pictures, and PhotoWorks – each once bustling places to share and store digital pictures that are now gone.
If you're lucky the Internet Archive may be help you find that personal website you created on Yahoo's GeoCities, once the third most popular destination on the Web. Then again, it might be just as true to say that if you're lucky, your GeoCities page and your old MySpace page are gone for good.
Preserving your personal history
Websites such as social networking curator Storify let users preserve their own digital footprints. The service allows you to tell stories on any topic by blending social media, including tweets, videos, and links to other online resources, and arranging them in a timeline.
For instance, the recent presidential debate was the subject of many Storify entries that recounted the evening using excerpted tweets, YouTube clips, and other media. But even when those collections remain, if the original poster deletes the tweet, image, or video, the Storify entry is left with a hole. Without contextual clues, the original content may be difficult or impossible to track down again, Nelson says.
Facebook says that it won’t delete your account, no matter how long it lies dormant. It will exist as long as Facebook does. On the other hand, extended inactivity on Twitter (or going longer than six months without use) could cause the site to delete your account.
When it comes to preserving or archiving our digital lives it's nearly impossible to guess what will be valuable over time, says Tom Woolley, curator of new media for Britain’s National Media Museum and curator of the museum’s Life Online gallery.
“In 20, 30, 50 years time, we’ll see there was stuff we should have saved,” Woolley says.
Maybe the content of a person's MySpace page or Twitter account isn’t historically relevant, in the grander scheme of things, but those personal sites are a snapshot of life and culture from a very specific time, with a value all their own in terms of “capturing our own history and our own written word,” Woolley says.