‹header›
‹date/time›
Click to edit Master text styles
Second level
Third level
Fourth level
Fifth level
‹footer›
‹#›
Good morning, I am Sarah Chandler.  I am a librarian at the School of Industrial and Labor Relations at Cornell, and for the past two years, I have been a member of the Cornell ENCompass project team.  My work on the team has focused primarily on user interface design, and today I am going to talk with you about how we have incorporated user feedback to modify the interface to our new system which is built on Endeavor’s ENCompass and LinkFinder Plus products.
[Read these…]
Find Articles/Find Databases/Find e-Journals is Cornell University Library’s new system for searching networked electronic resources, both proprietary and non-proprietary.  This research portal began over 10 years ago when Mann Library created a Gateway made up of two principal parts—a home page and the Gateway database, which was a database of metadata about selected networked resources.  Some of the resources were free, but many were licensed from a host of content providers.
In 1998, we redesigned and reintroduced the Mann Library Gateway as the common entryway to networked resources, services and information for the entire Cornell library system. In 2001, because the scope of the Gateway database had become rather unwieldy, we redefined it as the “e-Reference Collection,” which was a database of metadata about e-resources of significant reference value. By early 2003, it contained descriptions of about 1,000 databases and Web sites, including full text, indexes, catalogs, and numeric data.
The metadata in the e-Reference Collection permitted searchers to discover and then connect to the resources in its collection.
The e-Reference metadata was stored in a MySQL database and used the search engine Glimpse.  Networked resources that were selected for e-Reference were cataloged in Voyager and the MARC records were then transferred to the MySQL database and augmented with authentication and authorization information, as appropriate.  Users could either search or browse the database.
This system was a tremendous success.  In calendar year 2001, the average monthly connections to e-Reference resources was well over 55,000.
While e-Reference clearly worked well for our users, we knew that they expected new functionality, including searching across multiple databases at the same time, article level searching and reference linking.  Users had been trying to execute article level searches from the front page of e-Reference for years, not understanding that their searches were too specific.  While more experienced users realized they needed to locate and connect to the relevant database in order to retrieve abstracts and articles, less experienced users often ended initial searches in frustration, yielding no hits.
We also had technological and administrative reasons to change systems:  IT support for the e-Reference Collection moved to the Library Systems Office and a new server.
Timing and opportunity were also key to the decision to change systems.  The Cornell Library ENCompass project team was part of a development partnership with others around the country who worked with Endeavor to develop the ENCompass and later LinkFinder Plus products, which were designed to merge traditional and digital library environments.
The initial prototype and first release of ENCompass supported only nonproprietary resources.  However, by March 2002, Endeavor released version 2.0 of ENCompass, which was supposed to support proprietary (licensed) networked resources.
This release of ENCompass prompted us to move ahead with our plans to migrate the e-Reference Collection from its current platform to ENCompass.
Here is a look at key functionalities we anticipated ENCompass 2.0 would allow us to introduce.  On the right, you can see the new features (read bullets).  On the left are the features already built into e-Reference which we chose to keep as they still met the needs of our users. (read bullets)
I should note that while ENCompass 2.0 included authentication capability, we found it was not ready to handle authentication for our remote proprietary resources.  Our staff eventually integrated a custom authentication system using EZ-Proxy.
As we began working with release 2.0 “out-of-the-box” we recognized that we would require significant customization of the interface in order to meet the needs of our user base.  The product allowed for extensive customization through XSLT, which required a serious time commitment on the part of our team members with that expertise.  As a development partner, we were also able to turn to Endeavor for consultation.  We knew that starting down the road of customization would mean more complex maintenance and integration of upgrade enhancements in future, but we anticipated that the result would be a greatly improved product for our users.
[PAUSE]
We started building our new system interface in earnest in late summer of 2002, with the goal of coming up live the following May.  By fall of 2002, we had incorporated most of the new features and had a prototype that was functional enough to warrant user testing.  While we were still early in the interface design process and had plenty of useful input from members of our cross-functional project team, we knew that hearing from our users directly at this early stage was vital.
 
We formed a usability testing team which decided to conduct two rounds of small focus groups, one in October and one a month later.  In preparation for the focus groups, the usability team conducted a “mock session” with our project team and worked out some of the kinks of observing, recording and collating user behavior.
Our first focus group round was conducted using the “paper prototyping model,” where users were asked to study and “interact” with paper models as though they were using a live system.  We used screen shots printed on regular sized paper and on poster sized paper, employing a plotter.  We asked users a series of questions and recorded observations as they took actions and moved through different screens.  At this point, our prototype still resembled the out-of-the-box ENCompass interface fairly closely.  We conducted sessions with about twenty faculty and students separately.
After the first round, we collated the results and came up with our top recommendations for improving the interface and presented them to the project team.  (I will show you some of these in a few minutes.)  We had about two staff members dedicated to customizing the interface via XSL and XML so that we could integrate the suggested changes in a very short amount of time.
Within a month, we had many of the changes in place and were ready to test another set of users.  We held a second round of focus groups in November, using live workstations.  Again, we had a script of questions for the users to work through and recorded their actions and observations. 
  [PAUSE]
Lets take a look at what our users saw in these rounds of testing, but before we do that….
This is a screenshot of how we were experimenting with ENCompass 2.0 shortly before our October user testing.  Note that the interface used tabs to introduce two components:  searching metadata about resources (Suggest a Resource) and browsing or searching for and within all types of resources (Browse/Search Resources).  Browse/Suggest a Resource allowed for searching both local and remote repositories.  On our Browse/Suggest a Resource screen, we included mostly categories linking to databases in our local repository.  These were databases that users could search by descriptive metadata within our system and then connect to for “native interface” searching.  We also included “A Direct Searching Connection,” a category for those remote repository connections we were able to configure. Resources under this category could be searched right down to the article level.  Although http, XML Gateway and Z39.50 connections were offered in 2.0, we found Z30.50 to be most reliable.
[PAUSE]
Our project team found this mixing of databases and journal articles in a federated search interface complicated and potentially confusing for our users.  While we knew users would welcome the ability to search down to the article level and select more than one database at once, we felt that this “mixed granularity” approach would create confusion.  We made some modifications before presenting to our October test users, as you will see…
 
Here is what our main screen looked like for the October 2002 user testing sessions.  It still replicated the “Browse/Search Resource” screen of the ENCompass 2.0 interface quite closely, though we did choose to remove the direct searching connections and integrate article level searching into the interface later.  From the page you see, users could only search and browse metadata about resources, as was the case with the e-Reference interface.
The 2.0 design included a shaded table format with expandable/collapsible categories and checkboxes and presumed that the user knew to type a query term in the search box and then select one or more categories and/or specific resources below it to run a search.  We found that users were confused by this design, for several reasons.
Browsing and searching capabilities were presented in a way that was not intuitive.  It took users some time to figure out how to navigate the series of checkboxes beneath the search box.  This was compounded by the expanding/collapsing checkbox categories design; expanding a category meant reloading the screen with more subcategories showing and more shaded rows in the table to scroll through.  Users were disoriented as they clicked deeper and deeper into the same screen.
Further, some users did not understand that they could select and search for more than one resource at a time.  While cross-collection searching was a familiar concept to our users in other contexts, it was new to the e-Reference collection and required clearer presentation.
As had been the case in the past, users still attempted to perform article level searching, despite the explanatory text at the top of the page.  The e-Reference interface had included similar text, but clearly, many users were not reading it.
Finally, the name “e-Reference” appeared to hold less meaning than we anticipated for most of our users.  In this early prototype, you can see that we retained the “e-Reference collection” name at the top.  We wanted to preserve the identity of this heavily used system and had assumed that keeping the same name was important to our users.  While it is certainly true that experienced users identified with the name and that many librarians favored keeping it, our sample indicated that the name was not particularly meaningful to new users.  And, as we looked at adding the article level searching functionality, the name became problematic.
Here are some of the key recommendations we made based on this round of user testing:
The screen design was far too cluttered.  Users were looking for a Google-like search box with very little text.
Combining searching and browsing in the current presentation, on one screen, did not work.
“e-Reference” was not important enough to users to keep as a name and would no longer make sense as we added functionalities to the system and interface.
As they had with e-Reference, users still expected article level searching from the front page, and now that we had this capability, we needed to present it in an intuitive way.
I do not have screen shots of the prototype that was available for the November round, and, in the interest of time, I will skip ahead to what the interface looks like now, post-launch.  To encapsulate, the near twenty users in the second round found the interface easier to understand and had suggestions for more minor design adjustments.
Here is a later version of one of two new functioning screens we had users test in our November round.  Notice that we went with the tab approach but used simple, understandable terms, as Jakob Nielsen recommends and as is the current trend with many library research portals .  From our Library Gateway home page navigation bar, we planned to include each of these tabs as a separate entry point into the system:  Find:  Articles, Databases, and, (later), e-Journals.
The screen you are looking at, Find Databases, allows for resource discovery, what ENCompass 2.0 was calling “Discover a Resource” in that screen I showed you a few moments ago.  You will notice that we chose to keep searching and browsing on one screen but separated them and did away with the expandable table/checkbox look and feel.  If users wanted to browse through categories, they could click on a hypertext link and launch a new page, rather than expanding within the same page.  This approach had been taken with the existing e-Reference collection and had worked well, so after experimenting with alternatives, we decided to go with the familiar. 
To try to clarify the relationship between Find Databases and Find Articles, we added a scope note to the right of the search box.  Admittedly, this note added “clutter” but became a priority later as reference librarians submitted their feedback.
Here is an example of what you might see if you expanded a browse category below the search box…
The user could either connect to the resource from this screen or go to the “more info” screen for descriptive information, then connect.
You may notice the “Find e-Journals” tab at the top.  We added this functionality within ENCompass in August of 2003, so it was not included in the user testing in November.
If a user chose to search for a find databases resource, either by name or topic, this is what a results screen would look like.
Here is the Find Articles screen.  This is where we moved the remote repository connections, all of which are Z39.50.  Users can search one to eight databases at a time, down to the article level.  We limited the number to eight for two main reasons:
1.To optimize performance.   In this federated searching environment, the response time for the overall search equals the response time for the slowest repository selected.  We wanted to limit the number of databases selected per search to a reasonable number.
2.
2.To address licensing issues.  Some databases allow for a small number of simultaneous users and/or stipulate a per-search pricing structure.  By limiting to eight databases, we hoped to lessen the chance of users being locked out due to the “too many simultaneous users” scenario, and we hoped to avoid exceeding our per-search limit where it applied.  Unfortunately, the only text users see when they are refused a connection is a rather generic “search failure” message and currently we are not able to tailor the text of that message to be more specific.  Also, we did later find that some resources we included on this page which were on a “per search” pricing structure ended up getting unprecedented numbers of connections, and so some resources either had to be removed from the Find Articles page or the license agreements had to be reconsidered.
We chose to set up the results screen so that deduping would not occur since we anticipated patrons might want to compare records from different aggregators (full text versus abstract only, etc.)  The display of records from different databases would prove an ongoing XSL maintenance challenge as vendors do not follow one standard.
Though we did not have the next piece available for the November round of testing, I will show you how we later integrated LinkFinder Plus, the reference linking functionality, by clicking on one of the records from this result set…
Here is an object record for an article.  We chose “Find it at Cornell” as our brand name for reference linking.  If a user clicked on this Open URL link…
This would be the result.  It gives a link to full text if available and provides options to search our Catalog.  We chose to focus on only Catalog services for the launch but anticipated later adding other services, such as interlibrary loan.
[PAUSE HERE]
I’ve shown you what our system looked like as of our official launch, in May of 2003.  We’ve seen how user testing influenced the design of our system and helped further our intended goal of better meeting our patrons’ needs.  Along with this feedback from focus groups, we also had input from reference staff, as I mentioned earlier.
At this point, we were curious to see how the system itself was being used, from a more quantitative point of view.  So we wrote and were awarded an internal library grant for performing usage log analysis over the summer.
Our usage log analysis allowed us to observe user behavior without being intrusive and possibly influencing user results.  Since this initial analysis was done during the summer, you should know that we had not made a full cutover to the new system yet.  That is, we had the old e-Reference system working in parallel with the new Find Databases/Find Articles system from May 19th until mid August, when we eliminated the old system completely.  This meant that our analysis of the new system logs would not give us a true picture of how the system would perform at full capacity.  However, it gave us an opportunity to set up the initial analysis on a smaller, more manageable data set.  We had plans for rerunning the analysis once the new system completely replaced e-Reference.
Our data set for the summer analysis came from the Oracle tables in ENCompass and the Apache web logs.  We hired a Communications Department graduate student to run Microsoft Access queries on data from both sources, thanks to a Perl script a project team member wrote, which parsed relevant data from the voluminous Apache logs. Even with this script, we quickly found that merging the two data sets was problematic.  For example, the date and timestamps for each session did not line up in exactly the same way, and whereas Apache contained much more raw data, it actually did not contain the kind of detailed information within each session that the Oracle tables in ENCompass offered.
This said, we did gain some valuable information through the analysis.  Our student used Microsoft Excel to present the data analysis in the form of charts and graphs.  I will show you a few examples of what we found…
This is a look at how many users were accessing the system overall.  To be more accurate, this line graph shows how many sessions were logged during the 10-week period, by week.  A session is started when a user clicks on a link pointing to one of the files associated with the new system and begins performing what are called “search actions,” i.e., moving to another screen, performing a search, or browsing.  Unfortunately, for purposes of the ENCompass log data, we found that sessions ended not only by exiting the system and/or closing the browser but also by clicking on the browser back button.  This meant that if users hit the browser back button rather than navigational buttons provided in the interface itself, a new session ID would be logged automatically.  This problem needs to be addressed in future in order to ensure a more accurate count of session IDs.
Also, remember that we are looking at data during the time period when the old e-Reference system was still in use, so numbers here would be much higher once only one system was in place.
Now that we had federated searching capability, we wanted to know what resources in Find Articles were most popular and how the remote repository connections were working for our users.
Here is a chart of search activity in Find Articles, by repository or database, using data from the ENCompass Oracle tables.  Note that the yellow portion of each bar represents the number of failed searches.  These failures could be due to:
too many users
remote server down
our server down
license/subscription interruption
As I mentioned earlier, users only get one generic “search failed” message, regardless of the reason for the failure.
“Successful zero hits,” the light blue portion of each bar, represented searches that did not return any hits but did not result in a database failure.  Successful hits are in purple.
If you look at PubMed, eight from the left, you can see that the number of failures far exceeded the number of successes.  Thanks to this analysis, we were able to recognize the problem and remove PubMed from Find Articles until such time we could find a solution.
As you can see from the previous slides, our usage log analysis from the summer left us with a lot of questions.  We wanted to know how the system was performing now that students, faculty and staff had returned for fall semester and no longer had access to the old system.  So we revisited the analysis using newer data, this time for nine-week period, from August 29th to October 29th, for the ENCompass tables, and an eight-week period for the Apache tables (September 4th through October 30th).  Since we found the Excel analysis to be limiting in some respects, we chose to try using the statistical software SPSS.
 SPSS allows for some of the manipulation of the data that was performed with Access queries to be done within the SPSS program and offers graphics capabilities along the lines of Excel.  It also holds the potential for better documentation of how the analysis was run.  Since one of the goals of our grant project was to develop a methodology the library could use to run future analysis on this system and possibly others, SPSS was appealing.
Here are some examples of the analysis we ran on the fall data using SPSS…
This slide shows the session ID count for our new data collection period, taken again from the ENCompass logs.  Usage is much higher, which we expected, averaging closer to 15,000 sessions per week.  Our weekly session ID average for the summer data collection period was somewhere between 1,200 and 2,500.  Remember that this session ID number is inflated due to the browser back button issue, but at least it gives us an idea of the overall usage.
The pattern mapped out here mirrors our academic calendar somewhat:  classes began shortly before week 1 of this data collection period, so use rose as classes got into full swing, and use fell and rose again around the Fall Break period, in roughly week 7.
Here is how our users were accessing our new system, based on file name.  We wanted to look at whether they were using Find Articles Search, Find Databases Search or Find Databases browse more.  Since this data comes from the Apache web logs rather than the ENCompass tables, there is no back button issue.  So, we can assert that this data should give a more accurate picture of overall usage.
Here you can see that Find articles (in the red) was used the most, followed by Find Databases browse (in the purple), and then Find Databases search (in the green).   Find articles usage, as of late October, rose dramatically.  Since Find Databases Browse numbers are higher than those for Find Databases Search, we can infer from this slide that users who are accessing databases through the Find Databases page are doing so more often by browsing rather than by searching.
This slide ranks resources accessed through Find Articles and indicates the success versus failure rate for each database.  The data comes from the ENCompass tables.  You may notice that PubMed is still showing up – we removed it during this data collection period.  The other big offender is Life Sciences Collection, which also was problematic in the summer analysis.  Since we know that our Life Sciences Collection license agreement allows for an unlimited number of simultaneous users, we need to investigate other explanations for why the failure rate is so high.
You might wonder how users are connecting to databases now that we have two points of access – Find Articles and Find Databases….Here is a look at the month of October.  Connections through Find Articles are in the bars to the left, in light blue, and connections to Find Databases are in the bars to the right, in purple.
At first glance, it looks like many more users are searching databases through Find Articles.  However, there is a big caveat here:  whenever a user clicks on a database from Find Databases and then connects, we have no record of how many times he or she then runs searches in that database.  That is information that only the vendor has at the present time.  In contrast, since we have access to the data for how many times a user connects to a remote repository in Find Articles, via our ENCompass tables, we can determine how many actual searches were run per database.  As a result, the number of searches by database in Find Databases should be much higher than what you see here – we are only able to represent the first search each user runs.
Therefore, what you are looking at is not an accurate comparison.  For example, after a user connects to Find Databases, we do not know how many searches were run.  In order to run a true comparison, we would need to obtain vendor information on individual number of searches for Cornell users, by database. 
Because the Find Databases numbers you see are artificially low, we know that users are accessing certain databases (ProQuest and BIOSIS, for example) through the native interface more often than through the Find Articles page.  (If we were looking at a one-to-one search comparison, the purple bars would be much higher.)  In contrast, take a look at ArticleFirst and Zoological Abstracts.  A relatively small number of users searched those through the native interface, while a significant number of those users searched them through Find Articles.
And what about Find it at Cornell?  We are very interested in how many users are clicking on the Find it at Cornell link from a find articles object record, and further, how many are then clicking on the full text when that option is available for a given article.  Since reference linking is a service that users highly value and expect more and more, we anticipated that full text access would be high.
The chart on the left shows you the breakdown between simply clicking on the Find it at Cornell screen and clicking on the full text link for an article.  The red represents how many times users clicked on Find it at Cornell, and the blue represents how many times they went to the full text.  Given the assumption that users want to go to the full text and would click on a link if it were present, the numbers you see here surprised us.
I added the chart on the right for a point of comparison – take a look at the top line, which represents the number of times Find Articles was accessed, week by week.  For example, if we look at week 2 for this chart and compare it to week 2 of the chart on the left, we can surmise that out of about 5,000 sessions where users searched Find Articles, the Find it at Cornell screen was accessed about 1,200 times.  From there, users clicked on roughly 250 full text articles.  Again, it appears users are connecting to full text via Find Articles and Find it at Cornell  a surprisingly small percentage of the time.  At this point, we have only about 35 remote repository connections in Find Articles, whereas Find Databases connects to over 1,000 resources.   Although it is true that not all of the 35 remote repository connections in Find Articles link to full text articles, it is also true that reference linking should allow for a much higher rate of full text access.  Currently, the number of electronic journals in our “knowledge base” in LinkFinder Plus covers about 90% of our total Cornell subscriptions.
These results leave us with some important questions:
Are users selecting and searching more non-full-text rather than resources through Find Articles?  If they are searching full text resources, are they having difficulty recognizing how to connect to the full text?  Currently, we have the full text link showing up at the bottom of the Find it at Cornell page, below our Catalog search options, as is the default display setting in LinkFinder Plus.  We are in the process of moving the link up above the catalog search options so it will be interesting to see if that display order increases the number of connections to full text.  Beyond that, we need to do some user observation to determine how users are navigating Find it At Cornell and whether or not they are missing the links.
Now that we have completed two rounds of log analysis, we have a better idea of how we can use such analysis in future to further our goal of improving the user interface for Find Articles/Find Databases/Find e-Journals.   Here are some of the lessons we have learned:
Data sources are complicated.  We created data sets from the ENCompass logs and the Apache logs.  Within the Apache logs, we had to parse data from several different files, including one for the LinkFinder Plus or Find it at Cornell data.  There are still more data sources we could pull in, including the LFP table in ENCompass, and error log files in Apache. 
Without the custom scripts we had written, the kind of analysis we did would not have been possible.  Given our experience, we would like to see Endeavor consider how to make the ENCompass log data more compatible with the Apache web log data.  Additionally, if the issue with the browser back button leading to false session IDs were corrected, we could perform more meaningful and accurate analysis on some of the data in the ENCompass logs.  We would have a clearer idea of how many actual sessions were being logged and could better analyze how users move about within each session.
When we followed up with our analysis in the fall, it became clear that the only way the work would be useful to the Library would be through clear documentation of what was in fact a complicated, rather piecemeal process.  We explored using SPSS as a tool for formulating a methodology and have plans to better document the work we have done thus far.  We want to turn this data analysis product we have developed into a useful process for the Library.
As expected, our data analysis answered some of our questions and raised important new ones:
For example, we would like to better explore the relationship between Find Databases and Find Articles when it comes to zero hits.  When a user tries a search in Find Databases and gets zero hits, what then?  Does he or she try the same search in Find Articles?  In order to perform this kind of analysis, currently we would need to merge data from the Apache logs and the ENCompass tables.
When a user gets zero hits, what search terms are being used?  We can look at the actual query string from the ENCompass logs and perform an analysis for complexity, i.e., length of string, and misspellings, for both Find Articles and Find Databases.  For Find Databases, we can determine what databases are being misspelled often and add metadata in the object record for those resources in order to improve user access. 
We know that users are encountering search errors but have not delved into the error logs themselves, which are not included within the ENCompass logs. 
We have just begun to scratch the surface with our analysis of Find it at Cornell or LinkFinder Plus.  We have started to perform analysis on the Apache LFP file but have not analyzed data from the LFP file within ENCompass.  We want to know if it would be possible to use usage data to track whether a user indeed clicks on a full text link from Find it at Cornell when it is available. 
After having performed two rounds of usage log analysis on the heels of user testing, we at least have a reasonable idea of how well our system is meeting the needs of our users.  It is clear that we have more work to do, both in continuing to tap our user base via observation and user testing, and in continuing to perform usage log analysis.  I know that this kind of work holds interest for other institutions and would be interested to hear your observations, comments and questions about it.