1
|
|
2
|
- Part of a 4(5)-year NSF-funded project
- supported by the Digital Libraries Initiative, Phase 2 (Grant No.
IIS-9905955, the Prism Project)
- Also partially funded by a grant from The Andrew W. Mellon Foundation
- Political Communications Web Archiving http://www.crl.edu/content/PolitWeb.htm
- For updates:
- http://irisresearch.library.cornell.edu/VRC/
|
3
|
- Anne R. Kenney, Research Advisor
- Nancy Y. McGovern, Project Manager
- Richard Entlich, Sr. Researcher
- William R. Kehoe, Technology Coordinator
- Ellie Buckley, Digital Research Specialist
|
4
|
- "Preservation Risk Management for Web Resources: Virtual Remote
Control in Cornell's Project Prism"
- by Kenney, McGovern, et al, in DLib Magazine, January 2002
- http://www.dlib.org/dlib/january02/kenney/01kenney.html
- "Virtual Remote Control:
- Building a Preservation Risk Management Toolbox for Web Resources"
- by McGovern, Kenney, et al, in DLib Magazine, April 2004
- http://www.dlib.org/dlib/april04/mcgovern/04mcgovern.html
|
5
|
- because VRC develops models to represent essential features of selected
Web sites
- that enable ongoing monitoring over time
- to identify, respond to, and mitigate potential risks to the site
integrity and longevity
|
6
|
- because VRC is intended for use by cultural heritage institutions
- interested in the longevity of Web resources
- residing on remote servers –
- not owned or managed by the monitoring institution
|
7
|
- because at the most proactive end of the VRC approach
- a monitoring organization may act to protect another organization's
resources
- by agreement or implicit consent
- through notification and/or action
|
8
|
- Develop a model for research libraries (adaptable to other contexts)
- Support spectrum from passive monitoring to active capture
- Lifecycle support: selection to capture
- Understand nature of Web resources
- Promulgate good practice
|
9
|
- Two types of initiatives for monitoring and/or capture of:
- Web-based publications [Web site as a means]
- All (or a subset) of a Web site consisting of pages within a boundary
defined by a URL - or a portion of one [Web site as an end] (VRC)
|
10
|
- Two perspectives on Web-based risk:
- potential liability of an institution based upon the content of its Web
site, or a Web site for which it is responsible
- potential threats to the integrity and longevity of a Web resource (VRC)
|
11
|
- Include:
- technological obsolescence
- security weaknesses and breaches
- human-error in developing/maintaining sites
- organizational issues; benign neglect
- power and technology failures
- inadequate backup and secondary systems
|
12
|
- Organizational Context
- Combination of indicators
- Monitoring (change/loss over time)
- Triggers (events, organizational, upgrades)
- Degradation of site management indicators
|
13
|
- Identification
- Analysis
- Appraisal
- Strategy
- Detection
- Response
|
14
|
- 1. Identification
- Human: identify Web resources of interest
- Toolbox: verify list, expand list
- 2. Analysis
- Toolbox: crawl sites, generate characterizations
- Human: accept/revise characterizations
- 3. Appraisal
- Human: define/review attributes of value
- Toolbox: support appraisal, capture results
|
15
|
- 4. Strategy
- Human: develop/review strategies
- Toolbox: plot appraisals, compile strategies
- 5. Detection
- Human: define risk parameters
- Toolbox: identify/assess risks; propose responses
- 6. Response
- Toolbox: propose risk response based on rules; automatic response for
some risk categories
- Human: monitor automated responses; select response based on
recommended actions
|
16
|
|
17
|
|
18
|
- Potential multi-site impact
- Server vulnerabilities put site content at risk
- Patches and new versions of Microsoft IIS and Apache server released
frequently
- Apache http server 1.3 security updates
- to version 1.3.26 on June 18, 2002
- to version 1.3.27 on October 3, 2002
|
19
|
|
20
|
- Identify tools for each stage (adopt, adapt, define, devise)
- Leverage existing; apply to longevity
- Analyze steps - automated and manual
- Formalize protocol
- Provide a framework to map existing, plug gaps with developments
|
21
|
- Development steps:
- extensive literature review
- development of tool categories
- definition of categories and test protocols
- survey existing tools for evaluation
- select representative for testing
- highlight findings in category summaries
|
22
|
- traversing Web sites via links
- a capability common to most tools, but with different purposes and
results
- the VRC toolkit needs more than just Web crawlers
|
23
|
- Link checkers
- Site monitors
- Web crawlers
- Site managers
- Change Detectors
- Site Mappers (includes visualization)
- HTML Validators
|
24
|
|
25
|
|
26
|
|
27
|
- Frequency of capture – determined by
- nature of sites/pages
- events: technological, organizational
- resources
- Informed crawling
- Valuable vs. archival
|
28
|
- Fully document the site by capturing all changes to the pages/sites
- Capture significant changes to pages/sites
- Record periodic versions of the site
- Capture one-time copy of pages/sites
|
29
|
- VRC Preservation Risk Management Program:
- Map stages to tool requirements
- Apply to potential organizational scenarios
- Enable risk/response scenario development
- Toolkit:
- Revise and populate tool inventory
- VRC Control Site
|
30
|
- Develop approach for building human sexuality collection: capturing Web
blogs and other Internet communications
- State Government Web site case study
- Demonstrators for toolkit scenarios
|