Lots of Copies Keep Stuff
Safe
LOCKSS
| Mellon Foundation | |
| 2/6/2001 | |
| [updated 2/14/01] |
| building a | |
| Digital Preservation | |
| Internet Appliance |
Librarians Keep Paper Publications Accessible
| Distribute & house copies worldwide | |
| Loan copies to libraries on request | |
| Readers find a copy easily | |
| It is hard to find & destroy all copies |
Librarians Currently Ensure Documents Are Not “Unpublished”
| Publisher takeovers, buyouts, etc. | |
| Malicious act | |
| Natural disaster | |
| Being lost | |
| Official edict | |
| Simply by taking actions to support their local communities | |
| Be affordable | ||
| Cheap PC, Open-source software | ||
| Low administration “appliance” | ||
| Have low probability of failure | ||
| Many replicas, Resists attack, No secrets | ||
| Scale to enormous rates of publishing | ||
| Preserve access | ||
| Links resolve, Searches work | ||
| Conform to publishers access controls | ||
| Libraries take custody of content | ||
| Provides a simple web cache that | ||
| Never gets flushed | ||
| Holds authorized content | ||
| The cache | ||
| Pre-fetches content as published | ||
| Continuously validates against other caches | ||
| Repairs gaps from publisher and other caches | ||
| Persistence via redundancy | ||
| Not via media archiving | ||
| Any format | ||
| gif, jpeg, html, video, audio | ||
| Delivered through HTTP | ||
| More or less immutable | ||
| Not intended for dynamic content | ||
| Good match for peer-reviewed articles | ||
| Support | ||
| National Science Foundation | ||
| Sun Microsystems Labs | ||
| Stanford University Libraries | ||
| Mellon Foundation | ||
| Software | ||
| Technical design complete | ||
| Prototype working | ||
| Alpha test ends winter 2001 | ||
| Beta test starts spring 2001 | ||
| LOCKSS is feasible | |||
| ~15 caches, 10 months, ~160MB of Science Online | |||
| Collected content, detected & repaired deliberate damage | |||
| Survived fire, relocation, flaky hardware | |||
| Basic mechanisms work | |||
| Mixed multi/unicast communication | |||
| Over-replicated fault tolerance by “opinion poll” | |||
| Linux-based “internet appliance” | |||
| But work needed before beta | |||
| Administrator GUI | |||
| Repair damage from other caches | |||
| Hardening against attack | |||
| American Association for the Advancement of Science, American Physical Society, Federation of American Societies for Experimental Biology, Biophysical Society, Annual Reviews, Rockefeller University Press, The Endocrine Society, American Society for Biochemistry and Molecular Biology, American Association for Clinical Chemistry, National Academy of Sciences, British Medical Journal, American Psychiatric Publishing Inc., Oxford University Press, Company of Biologists Ltd, New England Journal of Medicine, American Society for Clinical Investigation, Radiological Society of North America, Society for General Microbiology, The Histochemical Society, American Thoracic Society, BMJ Publishing Group, American Society of Neuroradiology, Lipid Research Inc., American Society for Investigative Pathology, American Society of Plant Physiologists, The Royal College of Psychiatrists, Society for the Study of Reproduction, American Society for Microbiology, Cold Spring Harbor Lab Press, American Society for Pharmacology & Experimental Therapeutics |
| Libraries | ||
| ~60 widely distributed & varyingly configured caches | ||
| Test security, usability, performance | ||
| Journals | ||
| Not using real journal’s URLs | ||
| Simulating content [Science, PNAS, JBC, BMJ, a few US Gov Docs] on shadow servers | ||
| Isolate LOCKSS data streams & measure traffic | ||
| Test the system by turning off the publisher | ||
| If it works | |
| will provide access to content | |
| for many future generations | |
| Disclaimer: monolithic, homogeneous solutions are likely to fail, many digital preservation approaches are required |