How It Works
From LOCKSS
A library uses the LOCKSS software to turn a low-cost PC into a digital preservation appliance (a LOCKSS Box) that performs four functions:
- It collects newly published content from the target e-journals using a web crawler similar to those used by search engines.
- It continually compares the content it has collected with the same content collected by other LOCKSS Boxes, and repairs any differences.
- It acts as a web proxy or cache, providing browsers in the library's community with access to the publisher's content or the preserved content as appropriate.
- It provides a web-based administrative interface that allows the library staff to target new journals for preservation, monitor the state of the journals being preserved, and control access to the preserved journals.
Contents |
[edit] Collecting
Before LOCKSS Boxes can preserve a journal, two things have to happen:
- The publisher has to give permission for the LOCKSS system to collect and preserve the journal. They do this by adding a page to the journal's web site containing a permission statement, and links to the issues of the journal as they are published.
- The LOCKSS Box has to know where to find this page, how far to follow the chains of web links so that it doesn't crawl off the edge of the journal and try to collect the whole Web, some bibliographic information, and so on. In order to add new publishing platforms, The LOCKSS system provides a fill-in-the-blanks tool that a librarian or administrator can use to collect this information, and test that it is correct. The information is then saved in a file (a LOCKSS plugin) and added to the publisher's web site or to some other plug-in repository, so that it is available to all LOCKSS systems.
[edit] Preserving and Auditing
The LOCKSS Boxes at libraries around the world use the Internet to continually audit the content they are preserving. At intervals LOCKSS Boxes take part in polls, voting on the digest of the fragment of the content they have in common. If the content in one LOCKSS Box is damaged or incomplete that LOCKSS Box will lose the poll, and will have to repair the content based on other LOCKSS Boxes. This cooperation between the LOCKSS Boxes avoids the need to back them up individually. It also provides unambiguous reassurance that the system is performing its function and that the correct content will be available to readers when they try to access it. The more organizations preserve given content, the stronger the guarantee that they will all have continued access to it.
[edit] Providing Access
LOCKSS Boxes provide transparent access to the content they preserve. Institutions often run web proxies, to allow off-campus users to access their journal subscriptions, and web caches, to reduce the bandwidth cost of providing Web access to their community. Their LOCKSS Box integrates with these systems, intercepting requests from the community's browsers to the journals being preserved. When a request for a page from a preserved journal arrives, it is first forwarded to the publisher. If the publisher returns content, that is what the browser gets. Otherwise the browser gets the preserved copy.
[edit] Administering
Library staff administer their LOCKSS Box via a Web user interface. The interface enables new content preservation, monitors the preservation of existing content, controls access to the appliance, and a wide variety of other functions.
[edit] OAIS, Format Migration, Auditing Requirements
The LOCKSS system is OAIS compliant. OAIS Formal Statement of Compliance
The LOCKSS system moves content forward in time through a process called format migration. David S. H. Rosenthal, Thomas Lipkis, Thomas Robertson, Seth Morabito, "Transparent Format Migration of Preserved Web Content" D-Lib Magazine, Volume 11, Number 1, January, 2005. http://www.dlib.org/dlib/january05/rosenthal/01rosenthal.html
Auditing standards for digital preservation systems are in their infancy, and they will evolve as the technology and the community's understanding the problem space matures. The Library of Congress NDIIPP program and the LOCKSS Program are contributing to the community's understanding by developing audit protocols and best standards for security and bit preservation. Press Release



