‹header›

‹date/time›

Click to edit Master text styles

Second level

Third level

Fourth level

Fifth level

‹footer›

‹#›

William - Digital Library Research Group

Rob - heads up Metadata Services Unit

Project to describe 2 new pieces of functionality for the DSpace platform…

MIT OpenCourseWare is …

DSpace is …

Sponsored by MIT iCampus, 6-year, $25M alliance with Microsoft Research

2 year project, “CWSpace” = OCW + DSpace

Key topics: To support need for InterOperability, embrace Standards for:

-application profile development (in context of content packaging)

-Web Services access (new) to DSpace digital archive

Adoption of “Standards” always has a degree of interpretation, customization. Needs to be documented, made machine processable ideally, and further developed with other parties…

CUE NEXT SLIDE

O.K., to get some CONTEXT, what our Project is about … TURN (Proj. Vision, Goal, Method …)

MIT iCampus program: Microsoft Research alliance; $25M over 6 years; advanced educational technology enhancement projects

“Vision” embraces the role that this new material--OpenCourseWare--can play in the by now not new realm of the whole Open Access imbroglio regarding the scholarly record vs. the entertainment industries in terms of intellectual property rights and law, and how Universities need to do something to ensure they have a “place at the table” in bargaining in this arena. Link to notes on the Hal Abelson talk on the Information Commons, guarding against encroachment of IP interests and economic forces (read: publishers greed), etc.

http://msl1.mit.edu/furdlog/index.php?p=2996

“Goal” is more straightforward (!), nearly self-explanatory. All the Good Things come out of achieving this goal. (Well, at least the plumbing/mechanism is set up to permit people to now Do the Good Things with these kinds of tools.)

“Method” merely expresses the still very high level thinking about principles to employ in achieving the goal

CUE NEXT SLIDE

So, what drove that Vision? … TURN … (need for InterOp)

Here we mean InterOperation not only from system to system, but from entire domain to another: the world of teaching in higher education and that of the libraries.

“inter-operability” ™ with lowercase ‘i’ and ‘o’, and tongue-in-cheek “TradeMark,” is meant to signify “First Level” of interoperability, namely, that of moving packages of content around successfully.

First Level is all that this project claims to achieve.

•Second Level of interoperability might be that the semantics of what is received are understood, able to be processed.

Metadata like RDF, etc.

•Third Level might be that the pedagogical design of the prepared materials is (or can be made to be) coherent and suitable to incorporate with other materials in the collection. Learning Design and similar topics in which we don’t claim expertise.

•

Also - Higher Ed Tech sees need increasingly to Archive and to Preserve and to get Persistent URLs to material. Also - Simply the “born digital” nature of Teaching materials today is large driver to ask of Libraries/Repositories to manage it (cf. in past - no one conceived of Libraries holding course materials on paper, overhead slides, etc.)

CUE NEXT SLIDE

So, the growing intersection of interests and activity between these two domains creates some NEW OPPORTUNITIES for development … TURN (New Ground)

"What We Have" (to work with)…

Normalized, tightly structured content. The structure amounts to buckets. A static publication of html and pdfs. More like a book than a “living” website.

The MIT OpenCourseWare website has achieved the by no means small feat of gathering in one place and within one overarching structure the myriad types and kinds of pedagogical materials in use at a large research university, much of which was not even digital.

In so doing, they have established what had to be --if they were to succeed-- a highly normalized single model within which to present every possible MIT course, in all their variety and distinctiveness. They brought order where there had been none before, since there never was reason to establish it.

One big plus coming out of the normalization of courses is the set of "Sections" that OCW has derived from the range of materials and purposes to which it's put in MIT classes. This set of some 15 or so heading labels to organize the content according to use or kind or type is a very useful first cut at organizing a course's content, and is the first (and only) <organization> we use in the IMS Content Package manifest. Other future organizations may be used by other future consumers of the content, but this initial contribution of organization has been critical to a successful user experience of OCW material widely. You know what you are getting, from course to course, thanks to these Sections.

The resulting "Object Model" lends itself well to a process of mapping onto a digital archive, as, at the end of the day the rendered publication that is OCW is a large statically served website. It is this that we capture into a content package and hence to the digital archive.

The drawbacks to this process of heavy normalization is that the raw materials, while they would have been unwieldy to maintain closer to the publication engine, are in large part left behind in terms of repurposing or disaggregation or even of editing. This was in practical terms not possible, especially with the time pressures to publish the essential first representation of the content, namely the public website. Hence the publication of most material to the Portable Document Format (PDF) and similar decisions.

Subsequent more varied uses of the content are now being looked at (E.g. archiving; distribution to other audiences - faculty, translation partners, education partners, etc.) and some prospects of making available some of the editable originals (e.g. MS-Office files) is being investigated.

NEXT SLIDE SEGUE

O.K., so … TURN … What Do We Want To Do With It ?

1st of 3 slides re: What it is we’d like to be able to do vis-à-vis Interoparation

-Here, what we can achieve

-Next, where we’re blocked by policy and Intellectual Property issues

-Finally, a possible resolution to those issues (a dark(er) archive) t.b.d.

-This achieves “Job 1” of OCW to DSpace, good.

-Also “Job 2” of the Round Trip to support OCW needs.

-Finally, “Job 3” means that others can pull from DSpace packaged versions of OCW courses, using Web Services (or the Web U/I)

Note: This is within the single institution. More complex to carry further afield…

CUE NEXT SLIDE:

O.K., back down to earth, to our project’s scope, and to our 2 key topics: Application Profile for Content Packaging, and then (later) Web Services … TURN … (Object Model)

What does it mean, what is NEW, about working with OpenCourseWare content, in terms of archiving to DSpace?

(That is, this is in comparison with what DSpace has been typically used for to date: the research and scholarly record.)

1.Policy & the “What”-ness of it:

Educational material more recent arrival to Institutional Repository (IR) archving treatment, contrasted with traditional scholarly research, the academic record.

Teaching & learning materials regarded as perhaps less polished, final; can be ever-changing. Often more suitable to Content Management Systems than IR/archives.

OpenCourseWare is a static, end-of-semster snapshot of entire course offering; better first candidate for “Ed. Tech meets Digital Archive.”

2. Granularity:

OCW @ MIT does not have “Learning Objects,” per se. Instead our atomic unit is at the Course level. Ideally, educational content in DSpace could be treated at a more granular level (future work).

3. Complexity:

The OCW “Courseware” is a static website. That’s comparatively straightforward (compare dynamic CLEs), but still represents new ground for DSpace, which as classic IR had initial designs to handle very simple Items: a .PDF paired with a PostScript, and similar. Educational content almost always involves more complex digital objects (applets; online textbooks; course sites; etc.)

4. Back-End to Teaching/Learning Front-Ends:

More than the traditional scholarly materials, this teaching & learning content--as is obvious from what we call it--is a far more likely for classroom use, and to be served by CLEs or Image Tools and similar. This is at least one part of what drives the need for networked access via Web Services (the other, primary, need being the OCW-2-DSpace archive conduit).

CUE NEXT SLIDE (or “BUILD” in this case):

So, how does our project respond to these new things ? … CLICK (“Turn”) … (CP and WS)

Our project responds to these four areas like so:

The first two = no response, as such.

The latter two = the heart of our presentation: CP and WS.

1.It’s a given, for us, that the OCW content merits archiving.

It remains a discussion re: content direct from CLEs (unfiltered through OCW)

2. For our project the bar is not terribly high re: disaggregation & similar of OCW courses. They don’t lend themselves to it, and the repository doesn’t (yet?) lend itself to dealing with a lot of flexibility in this vein. Leaving this “As-Is” for the moment. Future work.

3. Content Packaging, and in particular an Application Profile to content packaging, is central to our work. As for the choice of _which_ content packaging standard, details later.

4. Web Services are called for, and we chose a particular technical approach to this that maps well to a digital archive. Details later.

CUE NEXT SLIDE

So, before we dive into these two topics, what is a BIRD’S EYE VIEW of what we are trying to achieve ? TURN … 1st of 3 InterOp…

This is just a Table-Of-Contents slide, for the material in the upcoming section

What is this content we have, from OCW?

The need to look at Object Models, both source and target systems that expect to interoperate

The various Packaging specifications that exist, which is best for the situation

The need to establish an Application Profile to dictate the particulars of your situation

Some observations re: tools that help document that Profile

This is just a Table-Of-Contents slide, for the material in the upcoming section

What is this content we have, from OCW?

The need to look at Object Models, both source and target systems that expect to interoperate

The various Packaging specifications that exist, which is best for the situation

The need to establish an Application Profile to dictate the particulars of your situation

Some observations re: tools that help document that Profile

What we'd like to talk about is the Object Model of the two domains: source OpenCourseWare mapping onto how the DSpace digital archive will represent the same material.

The entire picture of each domain is greater than the (sub)-set of objects that might actually go into a Content Package

… CLICK ("TURN") …

Some though not all of the overall Object Model mapping will be conveyed via the Content Package.

For example, the OCW homepage and the DSpace topmost community for OCW are not things from the Information Architecture that go in each package.

Design considerations include

-re: Object Model overall, what parts correspond to the Package, what are beyond it -Re: within the package, some thinking about how the material will be used, esp. re: granularity, aggregation, disaggregation, relationships that should be established or maintained in the representatin inside the new domain (archive)… OR the subsequent re-use in other systems…

… TURN

…where the Content Package now becomes the OAIS Reference Model's SIP and DIP, more of a repository view of the material

In fact, one key Use Case is simply the Round Trip back to OCW, so consideration of the DIP is important to that (it's effectively also round-trippable (by definition) to DSPace DSpace)

Other use cases might involve other systems wanting different levels of aggregation, organization, and types of descriptive metadata.

Rob working on slide re: details of OCW Object Model mapping onto DSpace object model…

OCW model … a publication. Highly normalized. Course … Section … Resource (single file, that).

That's about it. (This is a Good Thing.)

DSpace model … a digital archive. Hierarchical model, with ability to support cross-mappings, but at the leaf Bitstream, the hierarchy ends and all bitstreams are siblings, despite whatever complex hierarchical relationship they may have had in the source system (e.g. a website). Community … [Sub-Community] … Collection … Item … [Bundle] … Bitstream (Optional discussion: "Bundle" can be used somewhat akin to METS fileGrp @USE notion. In practice, most everything goes into a single Bundle named "Content." Other bundles for metadata, for licenses. You could also use for Thumbnails and similar purposes.)

Bottom line: the OCW course website's files, all of them, flow into one tall stack of sibling Bitstreams in the DSpace Item. DSpace attends to the rendering of the website as though the files were re-distributed back out to a set of directories and sub-directories to be able to be the website once more, as viewed from DSpace. Note that this calls for relative paths (e.g. "../../") in the source website pages (as would be true of any approach to making a website portable).

Rob working on slide re: details of OCW Object Model mapping onto DSpace object model…

The OCW Course will correspond to the DSpace Item. … TURN

Rob working on slide re: details of OCW Object Model mapping onto DSpace object model…

The OCW "Resource" (e.g. HTML page, .PDF, .JPG) will be a Bitstream in DSpace … TURN

Rob working on slide re: details of OCW Object Model mapping onto DSpace object model…

… but as discussed, DSpace will also store as an Item's Bitstream the other single files from the OCW Course: the Section HTML pages, the "Third Level" HTML pages, and all Resources, no matter where they appear in the website hierarchy.

This rendering of websites from within DSpace is another area of "New Ground" for the DSpace platform, as developed within the CWSpace project.

NEXT SLIDE SEGUE

So, having looked at the object models, how does this relate to content packaging? … TURN

The ability to package content is widely needed, and so various domains have created their own specifcations: METS fin the Libraries domain, IMS-CP in higher ed., DIDL is in use commercially and elsewhere, and XML Formatted Data Units is used in Aerospace

This project really only considered METS and IMS-CP to any extensive degree

Note that METS will be central to the DSpace platform's internal metadata representation in its next major release ("DSpace 2"), while IMS-CP is extensively used by higher education platforms

NEXT SLIDE SEGUE

Let's look at METS and IMS-CP … TURN …

In Common:

A comparatively straightforward manifest to account for the files in the package, and another approach to listing the content from various organizational or structural points of view.

Various places to attach relevant metadata.

Techniques for providing for recursive package structures.

Techniques for pointing to information found in files external to the manifest.

Differences

METS

More emphasis on file as file, independent of its use in any given component or resource.

More metadata (typically) captured per file.

Various kinds of metadata captured to own metadata area of manifest; connected to relevant content via @ID/@IDREFs Content overall is maintained in slightly more abstract manner; subsequent users of the METS file may well author new softwares to read and render this content, in new ways etc.

IMS-CP

More emphasis on resource components, "playable" pieces that cohere, have purpose, context.

Files therefore organized, manifest-wise, within the resource listing.

Metadata can be placed off of several different elements in the manifest; more open to interpretation of its meaning, relevance. Subsequent users of the IMS content package more likely to use in some mode as intended when packaged (using those organized resources). Not to say complete disaggregation couldn't occur, but less frequent scenario.

MAPPING

Here shown some initial thoughts on mapping from METS onto IMS-CP. This was devised for the current working group in updating the IMS-CP specification to version 1.2 (along with current work on Common Cartridge, a subset of the IMS CP for use between publishing houses and university CLEs)

NEXT SLIDE SEGUE

We elected to go with IMS-CP, as that really would better serve OCW's needs with various audiences other than the DSpace archive: faculty, other CLEs, other OCW institutions, education partners, translation partners. Also a simple downloadable course .ZIP was yet another benefit of this (for general site visitors).

Having selected IMS's Content Package, we need to map the OCW object model onto that, for interoperation with DSpace and other parties.

A few decisions on use:

-No use of IMS-CP's "sub-manifest"

-Instead, nesting of plain <item> elements within the <organization>, to mimic the website structure -OCW "Resources" (and "Sections" and Course Home Page and Detail Pages) all become IMS-CP <resource>

-Individual files are off resources

-LOM is attached to the Resource (not the Item)

-Some utility files are not resources

-OCW had a few ideas about additional management metadata needs

-Some of these were OCW-specific : ocw_imscp.xsd

-Some of these could be more generally "CWSpace": cwsp_imscp.xsd

Here we see how the basic hierarchical organization of the OCW model maps fairly straightforwardly to the IMS-CP's organization use of nested Items….

Course to IMS-CP <organization> of type="Course"

TURN

Here we see how the basic hierarchical organization of the OCW model maps fairly straightforwardly to the IMS-CP's organization use of nested Items….

Section HTML page maps to IMS-CP <item> …

TURN

Here we see how the basic hierarchical organization of the OCW model maps fairly straightforwardly to the IMS-CP's organization use of nested Items….

Subsequent OCW Resources etc. map to nested <item>s

TURN

Here we see at the level of file accountability how each Resource from OCW maps to a file beneath the IMS-CP concept of "Resource," which in turn is pointed to by the IMS-CP's Item.

Dotted Line Note: there are some miscellaneous, "non-Item" support files (utility files for rendering web pages (.gifs, .js, .css, etc.).

These too must be accounted for to go into the Package.

For the Target System of DSpace, the IMS-CP manifest will map down into the repository as follows:

TURN

The overall manifest is for the OCW Course, and the topmost organization represents that. This is mapped to create a new DSpace Item, within the Collection for that Department (within the SubCommunity of "Archived Courses", within the overall community of "OCW").

The first page (Section HTML page) is mapped onto a DSpace bitstream …

TURN

As in fact are all files.

Note that in the OCW object model each "Section" and "Resource" are in fact a single file. This is why the <item> can be construed as mapping straight onto a Bitstream…

TURN

Here at the level of file accountability we see how each <item> points to an IMS-CP <resource>, beneath each of which is the file.

That file is mapped onto the DSpace bitstream.

Dotted Line: likewise the miscellaneous, "non-Item" support files are in DSpace going to be Bitstreams.

Need for Application Profile perhaps greater with Content Packaging, as the specs are by definition quite loose. They're empty envelopes to carry content and metadata, and are markedly unspecified. Compare some descriptive metadata specifications (??), where there is still plenty of contention, but you aren't contending with "wildcard" use of entire areas of the spec '*' goes here. (???)

NEXT SLIDE SEGUE

Let's look at METS and IMS-CP … TURN …

NEXT SLIDE SEGUE

Let's look at METS and IMS-CP … TURN …

These are the options re: Submit.

The LNI also permits other programmatic actions on DSpace (e.g. map an item to another collection (COPY); list all collections I can submit to; disseminate an item (in a package); list all collections within a certain community; etc.).

The "Lightweight Network Interface" (LNI) is a work in progress to provide (yet) another way to gain networked access to the DSpace application-level API. Initially developed to support the requirement that "Web Services" be used on the CWSpace project (archiving MIT's OpenCourseWare (OCW) to DSpace), the ensuing vibrant (!) discussion regarding possible technical approaches (SOAP, RESTful, WebDAV, XML over HTTP) led us to define some high level goals for how this new interface ought to be shaped (e.g. platform-neutral; based on mature standards; readily extensible; work with (not replicate) existing remote APIs (SRW, OAI-PMH); comprehensive view of DSpace model; etc.).

Dubbed the "Lightweight" network interface, the intent has been to largely adopt the robust and proven protocols (and "verbs") of WebDAV and HTTP, and to establish a proposed mapping of WebDAV's Resource-centric view onto the DSpace object model. (Note that a set of SOAP methods has also been developed on top of the WebDAV functionality, such that in fact either approach is supported.) The details of the resulting proposed API were posted on a lengthy page to the DSpace Wiki; a "smoketest" Java client to the LNI was posted to the CWSpace Wiki (along with detailed instructions on how to work with SSL and the LNI).

This presentation introduces the technology, the API, the rationale for the approach, and a discussion of the mapping to the DSpace data model as well as example uses of the LNI to DSpace (e.g. submit; disseminate; copy to another collection; list collections; etc.).

Closely related DSpace development activity of the Plugin Manager, Packager Plugins, and Crosswalk Plugins will be addressed briefly as well, as these are used in conjunction with the LNI on the CWSpace project for submission (and dissemination) of IMS Content Packages of OCW content (courseware websites).

DAV = "Distributed Authoring and Versioning" more on that in a moment.

Note: There is also a SOAP (with WSDL) equivalent interface.

It's a Web-based protocol — a remote API to control DSpace

It's at the DSpace application layer — a peer of the Web U/I, can run in same servlet container

SEGUE NEXT SLIDE

We'll have a micro-intro to WebDAV, but first let's spend a moment on that DSpace object model…

Not everything is included in this picture: E-Persons; Workflow Items; Workspace Items.

Some of these are addressable via the LNI

Web server's file system rendered "DAV"-able

Shared authoring via desktop tools

Drag & drop, etc.

SVN Subversion based on DAV

Maps to a file system. This is what it was designed for.

NEXT SLIDE SEGUE

Use of these semantics against a digital repository is a fairly good mapping…

Jump to the bottom to note that it is only with the DSpace Bitstream that we map to the leaf node "Resource" in WebDAV.

Also note that the DSpace qDC Metadata does *Not* map to the Properties in WebDAV.

Finally, for the LNI application (DAV Server implementation), each DSpace object needs to have an addressable URL, special to the LNI (*not* for other, interactive, persistent uses) (Not for your browser; not to be "bookmarked"; they are used for the LNI interaction only.)

NEXT SLIDE SEGUE

We'll next look at what kinds of values those properties hold…

"TOC" page: next few slides review the Properties of WebDAV, the way DSpace provides a URL namespace for this, and the methods implemented in the LNI.

There are some 87 properties, spread across the some ten DSpace objects that the LNI currently provides for.

These can be "found" (PROPFIND).

Some can be "patched" (PROPPATCH) - modified.

The LNI URIs for DSpace objects are only for use with the LNI.

They are therefore not to be used:

as persistent URIs (e.g. bookmark)

as interactive URIs (e.g. browser)

3 assertions:

1. Handle - to - URI | URI - to - Handle

If you have a Handle, to do anything with the LNI, you must invoke the LNI lookup() function to get the DSpace LNI URI for that resource.

1.a. [Corollary] If you have a DSpace LNI URI, you may do a PROPFIND (or SOAP equivalent) to learn the Handle (if applicable) for that resource.

2. URIs Not Persistent.

If you have a DSpace LNI URI, you must NOT (ever) (we mean it!) "bookmark" that or otherwise maintain it outside the LNI session for any future use.

3.URIs are Opaque (!)

[Related to # 1] You must NOT guess or otherwise process any values (e.g. Handle) to a DSpace LNI URI

COPY is really more like LINK, but that's not mature

#No locking or versioning.

SOAP uses same HTTP PUT as WebDAV (vs. less mature SOAP Attachments)

DSpace-admin can map items, can change comm/coll administrative metadata.

Can list collections e-person can submit to.

Can list all collections in comm, etc.

3 assertions:

1. Handle - to - URI | URI - to - Handle

If you have a Handle, to do anything with the LNI, you must invoke the LNI lookup() function to get the DSpace LNI URI for that resource.

1.a. [Corollary] If you have a DSpace LNI URI, you may do a PROPFIND (or SOAP equivalent) to learn the Handle (if applicable) for that resource.

2. URIs Not Persistent.

If you have a DSpace LNI URI, you must NOT (ever) (we mean it!) "bookmark" that or otherwise maintain it outside the LNI session for any future use.

3.URIs are Opaque (!)

[Related to # 1] You must NOT guess or otherwise process any values (e.g. Handle) to a DSpace LNI URI

William - Digital Library Research Group

Rob - heads up Metadata Services Unit

MIT OpenCourseWare is …

DSpace is …

Sponsored by MIT iCampus, 6-year, $25M alliance with Microsoft Research

2 year project, “CWSpace” = OCW + DSpace

Key topics: To support need for InterOperability, embrace Standards for:

-application profile development (in context of content packaging)

-Web Services access (new) to DSpace digital archive

Adoption of “Standards” always has a degree of interpretation, customization. Needs to be documented, made machine processable ideally, and further developed with other parties…

CUE NEXT SLIDE

O.K., to get some CONTEXT, what our Project is about … TURN (Proj. Vision, Goal, Method …)