New Digital Archives Await Bush Records

database, files, records, archives, bush administration
For members of the Bush administration, Jan. 20, 2009, marks the end of a job. However, for the staff of the National Archives and Records Administration (NARA), it's just the beginning of a project unprecedented in size and scope: sorting, indexing, preserving and ensuring access to all the records, both paper and electronic, created by the administration over the past eight years.

In some ways, this is nothing new. Since 1978, when the Presidential Records Act was established, NARA has been tasked with taking custody of, controlling, preserving and providing access to all presidential and vice presidential records that have administrative, historical, informational or evidentiary value. The act requires that the day the president leaves office, presidential records become the legal responsibility of the archivist of the U.S.

However, given the rise in electronic communications, the volume of electronic records has exploded. Consider that NARA received only a few hundred thousand e-mail messages from the first Bush presidency and 32 million from the Clinton White House, according to Ken Thibodeau, director of NARA's Electronic Records Archives (ERA) Program, whose mission is to meet the many challenges stemming from increasing use of computers in government, including building a new archiving system, scheduled for completion in 2011. In comparison, it expects a whopping 140TB of data from the current Bush administration, more than 50 times what it received from the Clinton years. About 20TB of that is e-mail, Thibodeau says.

It hasn't helped that the Bush administration has been slow in providing NARA with needed information about the types and volume of data that will need to be archived. It wasn't until this summer that an intensive effort began to share information, Thibodeau says.

Much of the discussion has centered on how the White House will provide records in a format that is reasonably easy to use, since some of the systems are highly proprietary. "There's still some risk that some of it may not work exactly right, but we have a contingency plan: If that happens, we'll re-create the systems they have and access the records that way," he says.

Adding to the drama, questions have been raised about millions of missing e-mails from between March 2003 and October 2006. In early November, a lawsuit brought by Citizens for Responsibility and Ethics in Washington and the National Security Archive was upheld, challenging the White House's failure to properly store and recover millions of emails. In 2002, the Executive Office of the President stopped using the Automated Records Management System that had been in place since 1994, which automatically backed up all e-mails, but failed to install any other backup program.

But despite the controversy and opinions to the contrary, Thibodeau says NARA is prepared. In 1998, NARA began the process of building a system to preserve all types of electronic records created anywhere in the U.S. government, enable online transactions and collaboration with other agencies over the life cycle of government records, and provide access to these records to the public and government officials. The system, scheduled to be built in five increments, is slated for completion in 2011. The first increment, just completed in June, provides functional archives to preserve electronic data in its original format, enables disposition of agreements and scheduling, and receives unclassified and sensitive data from federal agencies.

By Dec. 5, the second increment that will handle the presidential records portion of the ERA system will be ready for the onslaught -- or as ready as it can be "when you're staring at 100TB of data bearing down on you," Thibodeau says. Even in this increment, however, the system will be used just by NARA staff and four pilot agencies, with public access slated for a later release.

The U.S. Government Accountability Office (GAO) has questioned the ERA's readiness, especially since the project has endured some bumps along the way, including delays and cost overruns estimated at $16.3 million. The life-cycle cost for the complete ERA system, scheduled to be completed in 2011, has been estimated at $453 million, including development contract costs, program management, research and development, and program office support.

As recently as September, after studying the system's progress, the GAO urged NARA to create a mitigation plan in case it could not process the incoming records by Jan. 20, 2009. In a report to the congressional committees (download PDF) , the GAO said, "If it cannot ingest the electronic records from the Bush administration in a way that supports the search, processing and retrieval of records immediately after the presidential transition, it will not be able to meet the requirements of the Congress, the former and incumbent presidents, and the courts for information in these records in a timely fashion."

Thibodeau says there is no noteworthy risk that the system would not be ready. If there are data formats the system can't ingest and index in a reasonable amount of time, he says, the short-term solution will be to recreate the applications used for those records and preserve and provide access for them that way.

Subscribe to the Best of TechHive Newsletter

Comments