LOCKSS Perspectives on the NDSA Levels of Digital Preservation
The following post was authored by Nicholas Taylor, Program Manager for LOCKSS and Web Archiving.
In 2013, a team of librarians, archivists, curators, engineers, and other technologists workshopped then published the NDSA Levels of Digital Preservation, a resource outlining high-level, progressive enhancements in digital preservation practices across a number of dimensions. A naïve review of its application in the intervening time suggests that it is regarded as a legitimate assessment and planning tool, and a foundation upon which to articulate new and/or improved preservation best practices.
In April 2018, the NDSA Coordinating Committee launched an effort, now underway in the form of the "Levels Reboot" to update and establish a process for continuing to update the guidelines. Responding to the survey in the kick-off solicitation to community mailing lists, the LOCKSS Program expressed interest in providing feedback in a number of areas, which we’ve gone ahead and outlined below:
Storage and Geographic Location
- The NDSA Levels of Preservation recognizes that more copies, shy of what we might consider lots of copies, help to ensure that at least one copy will remain intact long enough to repair the others, if needed. To increase the likelihood of maintaining a majority quorum of copies with agreeing fixity values against the strong possibility of system outages rendering one or more copies unavailable for extended time periods over the term of preservation commitment, we endorse four or more copies as a preservation enhancement.
- The NDSA Levels of Preservation recognizes the importance of diversification, for minimizing correlated risks. Heterogeneity of storage media, the geography of copies, and the natural disaster threats to which copies are vulnerable tends to increase the level of preservation. To this list, we'd add software and organizational infrastructure as key areas where diversification yields more resilient preservation. If closely-coupled software manages multiple, geographically-distributed, disaster-diversified copies stored on diverse storage media, all copies are simultaneously vulnerable to user error or malicious attack by a privileged insider, an outside attacker who compromises the system, or a bug in the software that propagates unintentional damage. If all copies are stewarded by a single organization, the persistence of the content depends not just on technical infrastructure, but on that one organization's unfailing financial and political commitment as well as durability. These two factors are arguably more significant causes of real-world data loss than the robustness of the technical infrastructure.
File Fixity and Data Integrity
- The NDSA Levels of Preservation supposes that fixity information can assure the integrity of other digital information without itself being subject to the same risks. Enhancements to this preservation dimension would be protecting fixity information with additional security controls, storing fixity information on different storage than the content, and ensuring that no single system or user has privileged access to both the fixity information and the content.
- The NDSA Levels of Preservation recognizes that detection of damage to content doesn't do much good if you can't also repair it. Repairs are potentially dangerous operations, though; care must be taken to ensure that bad copies don't inadvertently (or intentionally) overwrite good ones. Distinguishing bad copies from good requires confidence that whatever process compromised the integrity of the content (e.g., media failure, system failure, software bug, user error, internal attack, external attack, etc.) did not also compromise the fixity information.
Considering the NDSA Levels of Preservation overall, the activities associated with each level become progressively more resource-intensive. In the context of preservation planning, though, it's important to recognize that resources aren't just a constraint on the level of preservation that can be achieved; they also limit the amount of content that can be preserved, and these two axes of preservation trade off with one another. Deliberately cleaving to a lower level of preservation on some dimensions may be an optimum strategy for assuring that more content be subject to at least a base level of protection. This makes sense given that the greatest source of content loss is the failure to archive it in the first place.