DLF organized and conducted a workshop on social science data at Princeton University in January 1999. Social science data managers from DLF institutions joined a variety of experts to examine the state of the art in three areas: the discovery and retrieval of databases, the evaluation and interpretation of alternative data sources, and data extraction for analysis and presentation. Speakers included prominent faculty in the field, such as Gary King, of Harvard; Richard Rockwell, of the Inter-university Consortium for Political Science Research (ICPSR); and Daniel Greenstein, of the Arts and Humanities Data Service in the United Kingdom. Participants identified a set of activities that the DLF can undertake to advance the state of the art in these three areas with the goal of improving the use of social science databases in the undergraduate curriculum. The workshop report (http://www.diglib.org/collections/ssda/ssdaresults.htm) outlines an agenda for action that includes the following:
There are not enough skilled staff, nor is the infrastructure sufficient, to create and disseminate SGML- and XML-encoded documentation on a large scale. To address the problem, library and other institutional leaders must be informed about quantitative service needs, and encouraged to do strategic planning to meet the needs of research and teaching within and across disciplines.
Independent, uncoordinated, and duplicative work is under way on most campuses that are developing mechanisms that enable faculty and students to use key data sets. These access mechanisms must be designed to allow scientific replication of data analysis. One solution would be to devise a master plan for systematically dividing the labor of developing access tools among a variety of campuses. An alternative would be for institutions to contribute work in a common but distributed framework of data repositories and tools.
The development and deployment of a new, SGML-based standard for documenting data sets in codebooks will require campus investments, the conversion of existing codebooks to digital form, and research to understand how online codebooks will relate to other discovery and access tools for numeric data.
Strategies are needed to address the dual challenge of preserving digital data: maintaining tools to read the digital data files, while also preserving the codebooks needed to interpret the data output.
Tools and other facilities must be developed to help users understand and deal with comparability across related data sets.
More attention should be paid to the cataloging of data sets.
Consortia should be developedor existing consortia usedto negotiate the purchase and licensing of critical and expensive data sets.
The DLF, CLIR, and RLG have created an editorial board of experts to review the state of the art in visual resource imaging and to identify technologies and practices that can be documented and recommended to the community. The board decided to focus on documenting the science of imaging; that is, some of the objective measures of image qualities, such as color, tone, and resolution, and how they can be controlled in various aspects of an imaging process. It identified five areas in which to address these issues: setting up an imaging project, selecting a scanner, creating a scanning system, producing a digital master, and generating digital derivatives. Board members created detailed outlines for guides in these areas and suggested authors, whom DLF and CLIR commissioned to write the guides. The guides will be published on the RLG Web site in late 1999.