How LOCKSS Works
LOCKSS covers the entire digital preservation lifecycle.
Whether you are building a new network or using an existing one, the following offers a high-level view of the digital preservation lifecycle in LOCKSS.
Content to be preserved is characterized by diverse formats, packaging, and source platforms. Accordingly, LOCKSS systems employ a wide variety of ingest mechanisms, providing for effective, efficient, and flexible processing of content:
|For this kind of content or source platform:||These are common ingest mechanisms:|
|Institutional repositories||LOCKSS-O-Matic, OAI-PMH|
|Bagged content stored on disk||LOCKSS-O-Matic|
|Large- or medium-sized publisher||FTP, LOCKSS web harvester, OAI-PMH, rsync|
|Small non-OJS hosted publisher||LOCKSS web harvester, wget|
|OJS-hosted publisher||Direct, automated deposit into the PKP Preservation Network|
|Web content||Archive-It, wget, other archival web capture agents|
LOCKSS systems can also parse descriptive metadata from or presented along with the content to be preserved. Examples of supported metadata sources include an XML file in a declared schema, a BagIt metadata tag file, a RIS citation file found at a web address path identified by a regular expression, or text strings at a predictable location in an HTML DOM tree.
A web-accessible interface makes it easy to select new content for preservation, monitor preservation activity, and update the LOCKSS system configuration. Three audit and verification tools detail what content is stored in an individual LOCKSS system and its preservation status:
- The interface provides detailed preservation status information for each Archival Unit, which is a meaningful aggregation of content (e.g., a discrete subset of a digital collection, a volume of a journal, a book, etc.).
- A LOCKSS system administrator can use a properly-configured web browser from an authorized IP address to view preserved content through an audit proxy. This presents the content as it was collected by the LOCKSS system.
- If the content is a book or journal (i.e., an object with associated bibliographic metadata), the LOCKSS system can produce a KBART (PDF) report of the locally-preserved content.
The LOCKSS system continually monitors and assures the intactness of its stored content, by cooperating with other LOCKSS systems in the same network to compare copies. This activity takes place both during and after ingest:
- As ingest proceeds, the monitoring technology ensures that each LOCKSS system in the network collects all of the intended content, thus preserving the authoritative version.
- After ingest is complete, the LOCKSS systems in the network confer at regular intervals to determine whether any content has been damaged or lost, and can arrange for content repair from one another. The administrator of each LOCKSS system can monitor the preservation status of stored content by consulting the user interface.
A sophisticated protocol prevents nodes from falsifying or feigning results to the comparison operations and authorizes repairs only above certain consensus thresholds. This approach is resilient to a broad array of threats.
An individual LOCKSS system can both preserve and provide access to stored content. There are three ways that a LOCKSS system can deliver content: by proxying (i.e., acting like a web cache), by serving (acting like a web server), and through integration with an OpenURL resolver.
- Institutions run web proxies to allow off-campus users to access restricted content. When configured for proxy access, the LOCKSS system ensures that content requests are seamlessly fulfilled when the content is otherwise unavailable from its original source.
- In the basic serving model, preserved content is provided from local web addresses corresponding to the LOCKSS system. The LOCKSS system checks whether the original source will provide content to fulfill a given request. If the content is not available from the original source, the LOCKSS system serves its own copy.
- Institutions can make content stored in their LOCKSS system discoverable and accessible through their discovery systems by adding the LOCKSS system as a target in their OpenURL resolver.
Each community determines its own access policies for content stored in a LOCKSS network, with consideration for content rights and restrictions, governing values, and technical implementation choices. To date, LOCKSS networks have either been dark (for access, the content is transferred from the preservation network to a separate access platform) or gray (access is provided from the LOCKSS network, automatically, when the content is unavailable from its original source).