Content preserved in a LOCKSS cache is served to users by a proxy server1. In order to access it, either a user's browser can be configured to send (selected) proxy requests to the cache, or a departmental or institutional proxy server can be configured to do the same for a community of users.
The cache normally defers to the publisher site, and serves its own preserved content only if there is no newer content available from the publisher. That is, each request received is first forwarded to the publisher's site, and if the publisher responds with content, that response is sent back to the user. If the publisher does not have the requested content, or it indicates that the cache has an up-to-date copy, or it fails to respond within a short time, the cache serves its preserved copy. The result is transparent to the user.
Contents |
[edit] Verifying Stored Material
It is likely that you will wish to confirm that your LOCKSS box has correctly retrieved and stored an Archival Unit. There are two methods you can use to monitor the content stored.
[edit] Viewing the Status of an Archival Unit
From the cache administration interface, select Daemon Status -> Archival Units. You will be presented with a list of the Archival Units (AUs) currently stored in your LOCKSS machine. Selecting the link corresponding to an Archival Unit will display the current status of that Archival Unit.
Information of note here is the Disk Usage, the current Status, whether the AU is still Available From Publisher, and the tree of individual files and directories that have been stored for the Archival Unit. You can also view individual files stored in your LOCKSS machine from this page by selecting individual NodeUrls. However, note that you will not be able to navigate between linked pages contained in the same Archival Unit. Currently, the LOCKSS Content Audit, described below, is required for this.
[edit] Auditing a Cache's Contents
When integrated with your institutional proxy, the LOCKSS proxy server does not provide an easy way to audit the contents of a cache. This is because the content returned may come from the publisher, and will only be served from the LOCKSS cache if it's either unavailable from the publisher, or it's up-to-date (meaning that the publisher responds to an if-modified-since request with a 304 (Not Modified) header). To verify that a cache does actually hold the content it claims to, the LOCKSS machine can be configured to operate as an audit proxy server. In this case, the LOCKSS proxy will never forward requests to a publisher, and only serve locally preserved content.
From the cache administration, select Proxy Options and check the "Enable audit proxy" box, then click Update Proxy. This will configure the machine to listen for HTTP requests on the designated port (by default, this is port 8082). Please also ensure that the IP address of the machine you will be using to audit the proxy is in the Allow Access list available on the Proxy Access Control page. You can now configure your web browser to proxy all HTTP requests to the designated port on the cache.
Now, if you fetch a URL that is preserved in the cache you will see its contents, but any other URL will return "404 Not Found". Note that though the URL is preserved and the content may appear to be delivered from the original URL, content is being served from the local LOCKSS proxy server. In addition, even within cached content there may be broken links or images, as some pages point to resources that are not part of any preserved AU (eg, off-site links). The LOCKSS proxy will return a 404 error for these. When you are finished auditing your content, you will need to revert your browser’s proxy configuration back to its original settings, and disable the audit proxy on the LOCKSS cache.
