EasyDeposit – SWORD deposit tool creator

The development of the SWORD (Simple Web-service Offering Repository Deposit) protocol has enabled repositories to start accepting deposits from remote systems and interfaces. If you’re unsure of the basics of SWORD, read one of the following:

However, to date there has not been a great deal of use of SWORD. One of the reasons is a lack of SWORD clients that can deposit items into repositories. Demonstration clients were created by the SWORD project, and a PHP SWORD library was created by the SWORD2 project, but no client that can easily be set up by web developers or repository administrators to be used by depositors has been created.

A bit of background:

Last year as part of my job at the University of Auckland Library, I had to create a SWORD deposit client to allow PhD candidates to submit an electronic copy of their thesis. We wanted to use SWORD to do this as it means the PhD students do not have to create a repository account, and learn how to submit in the repository. The SWORD client was written in PHP and made use of the SWORD PHP library. The client was made up of a very small number of pages: login, enter title of thesis, upload file, select embargo and licencing options, verify, submit.

I then had to create a second similar deposit interface to allow a department to archive a technical report series. This deposit interface was similar, but didn’t have the embargo option, asked for more metadata, and returned the URL of the deposited item in a format that could be inserted into their own web publishing system.

Developing and maintaining two similar but not identical systems seemed to be wasteful, therefore I decided to create a generic SWORD deposit interface toolkit that allowed new deposit systems to be easily created. EasyDeposit was born!

What is EasyDeposit?

EasyDeposit is a toolkit for easily creating SWORD deposit web interfaces using PHP. To start using EasyDeposit, follow the installation instructions.

How does EasyDeposit work?

EasyDeposit allows you to create customised SWORD deposit interfaces by configuring a set of ’steps’. A typical flow of steps may be: login, select a repository, enter some metadata, upload a file, verify the information is correct, perform the deposit, send a confirmation email. Alternatively a deposit flow may just require a file to be uploaded and a title entered. A configuration file is used to list the steps you require.

EasyDeposit makes use of the CodeIgniter MVC PHP framework. This means each ’step’ is made up of two files: a ‘controller’ which looks after the validation and processing of any data entered, and a ‘view’ which controls the web page that a user sees. This separation of concerns makes it easy for web programmers to edit the controllers, and web designers to tinker with the look and feel of the interface in the views.

What ’steps’ come with EasyDeposit?

EasyDeposit comes with 14 different steps, including:

Extra steps can be easily added just by adding a controller and a view for each new step.

Is EasyDeposit open source?

Yes! It is published with a modified BSD licence.

How do I use EasyDeposit?

Follow the installation instructions! If you have any questions, please leave comments on this blog entry, to get in touch with me directly.

Bookmark and Share
Posted on February 3, 2010 at 7:14 am by Stuart · Permalink · One Comment
In: Uncategorized · Tagged with: , , ,

Displaying citation counts in DSpace

In the repository world we’ve known for a while now that unless the repository provides value to a researcher, they won’t use it. Nothing pleases a researcher more than to see nice big citation counts for their papers. Wouldn’t it be nice if DSpace repositories could display the citation count for archived papers?

cite

I received an email yesterday about the Scopus API, so thought I’d play with it for a bit of a ‘Friday afternoon experiment’. So here is a quick recipe for adding citation counts to DSpace’s JSPUI:

  1. Register for the Scopus API service: http://searchapi.scopus.com/
  2. Register your website (e.g. http://dspace.example.com/): https://searchapi.scopus.com/developerProfile.url
  3. Download and save this patch (it only edits two files – display-item.jsp and header-default.jsp)
  4. Edit the patch and insert your developer ID where you see XXXXXXXXXXX in the Javascript
  5. Apply the patch to your DSpace instance
  6. Re-build, and redeploy your DSpace instance
  7. Visit any item that has a DOI stored in the dc.identifier.doi field
  8. Look out for the citation count appearing at the top (if the item has a count of more than 0!)
Bookmark and Share
Posted on October 30, 2009 at 3:44 pm by Stuart · Permalink · 4 Comments
In: Uncategorized · Tagged with: , ,

If SWORD is the answer, what is the question?

I’ve just had a new collaborative paper published: ‘If SWORD is the answer, what is the question?’ (DOI: 10.1108/00330330910998057). It covers the most recent iteration of the SWORD repository deposit standard, looks briefly at some issues around the present lack of adoption of SWORD, and most usefully presents seven use cases of SWORD written by their developers:

Lewis, S., Hayes, L., Newton-Wade, V., Corfield, A., Davis, R., Donohue, T., Wilson, S., If SWORD is the answer, what is the question?: Use of the Simple Web-service Offering Repository Deposit protocol, Program: electronic library and information systems, 2009,  Vol 43, Issue 4, pp: 407 – 418, 10.1108/00330330910998057, Emerald Group Publishing Limited

Of course a copy is available open access in our repository: http://hdl.handle.net/2292/5315

Abstract:

Purpose – The purpose of this paper is to describe the repository deposit protocol, Simple Web-service Offering Repository Deposit (SWORD), its development iteration, and some of its potential use cases. In addition, seven case studies of institutional use of SWORD are provided.

Design/methodology/approach – The paper describes the recent development cycle of the SWORD standard, with issues being identified and overcome with a subsequent version. Use cases and case studies of the new standard in action are included to demonstrate the wide range of practical uses of the SWORD standard.

Findings – SWORD has many potential use cases and has quickly become the de facto standard for depositing items into repositories. By making use of a widely-supported interoperable standard, tools can be created that start to overcome some of the problems of gathering content for deposit into institutional repositories. They can do this by changing the submission process from a “one-size-fits-all” solution, as provided by the repository’s own user interface, to customised solutions for different users.

Originality/value – Many of the case studies described in this paper are new and unpublished, and describe methods of creating novel interoperable tools for depositing items into repositories. The description of SWORD version 1.3 and its development give an insight into the processes involved with the development of a new standard.

The seven case studies include a thesis submission system, a SWORD plugin for moodle, an automated laboratory data repository deposit tool, a desktop deposit tool, the BibApp repository integration module, a custom deposit tool for a technical report series, and the Facebook SWORD deposit tool.

Bookmark and Share
Posted on October 13, 2009 at 9:40 am by Stuart · Permalink · Leave a comment
In: Uncategorized · Tagged with: , , ,

SWORD PHP Library version 0.9 released + moved to github

I have just released version 0.9 of the SWORD PHP library. It has a few fixes and changes, the most noteworthy being:

These changes have come about following bug reports from other users of the code, and from some enhancements we needed for an exciting configurable SWORD web client that we’ll be unleashing on the world in the next week or two from the University of Auckland Library.

An important change is that the new code is now stored in a git repository at github (although it can still be downloaded from http://php.swordapp.org/):

I’m new to git, so if anyone who knows git better than I do notices that I’m not using in an optimal way, I’d love to hear about it.

Bookmark and Share
Posted on October 6, 2009 at 9:49 am by Stuart · Permalink · Leave a comment
In: Uncategorized · Tagged with: , ,

Library Mashups book – Chapter 17 now Open Access

Library Mashups book cover imageA new book ‘Library Mashups – Exploring new ways to delivery library data‘ has now been published. The book, edited by Nicole Engard, has a great list of 25 authors from all across the globe, including well known names in the library-tech world such as Tim Spalding, Ross Singer, Bess Sadler and Bonaria Biancu. The chapters cover subjects from the basics such as ‘What is a mashup?’ and ‘Making your data available to be mashed up’, to loads of very specific library-oriented chapters such as ‘Mashing up with librarian knowledge’, ‘Breaking into the OPAC’ and ‘Mashups with Worldcat affiliate services’. There is also a section of the book about interacting with other types of services such as maps, pictures and videos.

Why am I writing about this? Well, for three reasons:

1) The book is great. I’ve learnt a lot from it, and have enjoyed reading it. I particularly like this quote by Tim Spalding (of LibraryThing.com) in his chapter “Breaking into the OPAC”:

As a computer programmer with no experience of the library world, I figured this [helping libraries to add LibraryThing data to their catalogues] would be a simple problem to solve. Of course I found out that the library world was different. The code behind its systems was closed and unextensible, with virtually no APIs in or out.

Read his chapter to hear his experiences and answers.

2) The second reason is that I am one of the lucky authors who has been able to contribute to the book. Chapter 17 is “The Repository Mashup Map” which looks at the development of the Repository66 mashup map of Open Access repositories across the world. The chapter explores why the mashup was created, how it was created, and (hopefully) most usefully some of the design decisions that need to be taken into account when making a mashup (decisions related to when and how to download the data, how to match sources, and when and where to manipulate the data etc).

3) However, the main reason for this blog post is to say that a copy of the chapter has now been published online ‘Open Access’. You can find it in the DSpace repository we run at the University of Auckland Library:

Download URL: http://hdl.handle.net/2292/5258

I hope that you find it useful.

[UPDATE 2/Nov/2009]: Chapter 2 of the book ‘Behind the Scenes: Some Technical Details’ by Bonaria Biancu is now also available open access: http://hdl.handle.net/10281/5117

Bookmark and Share
Posted on October 5, 2009 at 1:41 pm by Stuart · Permalink · One Comment
In: Uncategorized · Tagged with: , , , ,

Using GMail with DSpace

gmailFrom time to time a DSpace repository will send emails. It does this when new users are added, when new items are added, when workflow tasks need to be completed, when exports have completed etc etc. For DSpace production servers this is normally trivial to set up; enter the name of your email server in the configuration file, enter the email address that emails should be sent from, and it usually just works

However for developers working on DSpace, it can be a bit harder. For good reasons (e.g. spam reduction etc), many institutions make their SMTP email servers accessible only from within their network. So if you are working on DSpace from home, a cafe, or on the train, you don’t have access to your institutional network because you don’t want to bother running VPN. An ideal solution to this problem would be to use a third party SMTP server outside of your institution which you can use which developing. An ideal candidate for this is Google’s GMail.

At present though, DSpace does not allow you to configure some of the more complex aspects of email server configuration like enabling SSL connections, or which SocketFactory to use. These need to be set in order to connect securely to the GMail servers. We’ve had an open issue in the DSpace JIRA issue tracking system to address this, and it has now been included ready for DSpace version 1.6, and can be easily back ported to earlier versions.

To use the GMail servers, configure dspace.cfg as follows:

# SMTP mail server
mail.server=smtp.gmail.com

# SMTP mail server authentication username and password (if required)
mail.server.username = your-user-name@gmail.com
mail.server.password = your-gmail-password

# Pass extra settings to the Java mail library. Comma separated, equals sign between
# the key and the value.
mail.extraproperties = mail.smtp.socketFactory.port=465, \
           mail.smtp.socketFactory.class=javax.net.ssl.SSLSocketFactory, \
           mail.smtp.socketFactory.fallback=false

If you are a Google Apps user this will also work if you substitute the username with your Google Apps email address.

Bookmark and Share
Posted on September 5, 2009 at 10:22 am by Stuart · Permalink · Leave a comment
In: Uncategorized · Tagged with: , ,

Tweeting temporal tidal data

There are movements worldwide to free not only research publications through the Open Access publishing movement, but also to make data sets free and open. In New Zealand work in this area is being championed by the public OpenGovt.gov.nz site which has a useful open data catalogue of online open government created data sets. Having been involved with Open Access publishing for a few years due to my involvement with open access repositories, I thought I’d better start to get more involved.

One of my favourite Twitter feeds is that of @NZ_quake which is run by Simon Lyall. This twitter feed periodically polls the GeoNet website which lists the latest earthquakes to occur in New Zealand (quite a regular occurrence!). When it sees that a new earthquake has been reported, it sends a tweet:

nzquakeThis got me thinking about other temporal data sets that could be usefully turned into a Twitter feed. Having lived in coastal areas for the past 12 years, my thoughts turned to the tides. Tides are constantly changing, and knowing the current state of the tide can be important. I thought it would be good to create twittering tide tables (or to ‘twitterify’ the name, twides!!!)

Luckily for me, there is plenty of open data in this area. For New Zealand, comprehensive data is provided on the Land Information New Zealand web site. Data is provided for sixteen standard ports, and a further hundred or so secondary ports. The data is available in either CSV or PDF format (I chose the former), and despite the website only offering this year’s and next year’s data, a bit of URL tweaking can also grab the data for 2011 and 2012.

There is no obvious use or re-use licence on the tides page, just a disclaimer and a link to a Crown Copyright declaration which does (commendably) include an open licence:

The material may be used, copied and re-distributed free of charge in any format or media. Where the material is redistributed to others the following acknowledgement note should be shown: “Sourced from LINZ. Crown Copyright reserved.”

A quick script can take this data (one row per day) and re-format it as one tide (high or low) per line with a date-stamp. Another quick little script runs every minute via a cron job, and checks each of the ports to see if it is currently high or low tide there. If it is, it sends a tweet using the Twitter API

aklexampleI have created twitter feeds for three New Zealand ports so far:

  1. Auckland
  2. Wellington
  3. Onehunga

There is also a combined feed of all the tides at http://twitter.com/alltwides. If there are any other New Zealand ports that you would like to have a Twitter feed for, please feel free to get in touch as I have a simple script to create new feeds. Or if you know of other tide tables that are exposed via Twitter I’d be interested to see them.

Does Twitter provide a useful outlet for temporal data, or for tide tables? I’d be interested in your opinions! Please leave a comment below.

Bookmark and Share
Posted on August 10, 2009 at 9:34 pm by Stuart · Permalink · 7 Comments
In: Uncategorized · Tagged with: , ,

Email your repository

What modern information handling system do we probably interact with most each day? For the majority of us, it is probably our email. We send and recive dozens of emails each day. So how about enabling repository deposit via email?

It has certainly been talked about from time to time, and a plugin to the Thunderbird email client has even been written that allows you to deposit attachments into repositories using SWORD (no mention is made of metadata). This plugin should work with any SWORD enabled repository, but only works with one email client. Wouldn’t it be great if there was a more general solution that worked with all repositories, and all email clients?

Now there is! The latest version of the SWORD PHP library (version 0.8) contains an example script showing SWORD and the PHP library in use. To make it work, just fill out a configuration file with your email address, password and IMAP mailbox details, and your repository login, password and deposit URL. After the configuration file has been filled in, all you need to do is run the script ‘imap-mail.php’ on the command line. The script will connect to your mailbox and look at each unread message. It will package each one up and deposit it into the repository.

How does it work?

It uses the standard PHP IMAP library to connect to your inbox. For each unread message it finds, it extracts the name of the sender of the email, the email subject, and the body of the message. It uses these for metadata:

Along with the metadata, the script adds each email attachment to the deposited item. An example of an email deposited this way can be seen at: http://dspace.swordapp.org/jspui/handle/123456789/318

If you want to try it out, but don’t want to set it all up, for a limited time I’ll leave it running against the deposit@swordapp.org mailbox. Send an email to deposit@swordapp.org and when I periodically run the script, your email will be deposited into the test DSpace/SWORD repository at http://dspace.swordapp.org/. Please bear in mind that the repository is open access to the world, so anyone can see what your email and optional attachments contain! (Please consider working hours / timezones etc when working out when I am likely to next run the script! You will receive a confirmation email when your deposit has taken place.)

A few further thoughts:

If I get time, my next extension to the PHP SWORD library will be a basic web client (similar to http://client.swordapp.org/ except written in PHP, and will create packages from files for you). If you have any other suggestions, please leave a comment!

Bookmark and Share
Posted on July 28, 2009 at 8:19 pm by Stuart · Permalink · 2 Comments
In: Uncategorized · Tagged with: , ,

Direct from MS Word to DSpace via SWORD

As a member of the SWORD project, it has been a great seeing Microsoft’s External Research group integrate SWORD into Word 2007, their Zentity repository, and their online journal hosting system. There is a good overview of this work in a presentation given by Pablo Fernicola at the Open Repositories 2009 conference entitled ‘Connecting Authors and Repositories Through SWORD‘.

This blog post is about the functionality I have added to DSpace to allow it to accept deposits from within Microsoft Word using SWORD.

If you are unaware of the authoring add-in, then before reading the rest of this blog, take a look at Pablo’s YouTube video ‘Integrating with repositories and journal submissions’ at http://www.youtube.com/watch?v=2_M2gfUyVzU. The video explains the authoring add-in, so I’ll not duplicate that information in this blog post. The rest of this post explains how I extended DSpace to work with the add-in…

In order for DSpace to be able to ingest a package, it needs an ingester that understands the format and knows how to unpack it and extract the metadata and file(s). In the case of .docx files created by Microsoft Word, it needs to know how to extract the metadata from within the file, and to archive the file as-is. This is a pretty easy task as a .docx file is actually just a zip file (try renaming it from .docx to .zip and then take a peek inside!). So I wrote an ingester than unzips the file, extracts the NLM metadata that the add-in inserted in the file, and then creates a new DSpace item with that metadata. Finally it adds the complete .docx file as a bitstream for people to download.

Some of the metadata such as the authors identities are held in the .docx file is held in the customXml/item*.xml files, and other parts such as the article title and abstract are held in the actual document contents in word/document.xml. The ingester extracts these values for use in the new DSpace item.

<w:t>Add an S to Microsoft Word and you get SWORD</w:t>
<my:name.>
<my:name.content-type.datatypeattribute.attribute.></my:name.content-type.datatypeattribute.attribute.>
<my:name.name-style.datatypeattribute.attribute.></my:name.name-style.datatypeattribute.attribute.>
<my:surname.>Lewis</my:surname.>
<my:given-names.>Stuart</my:given-names.>
</my:name.>

I then configured the DSpace ingesters to use the docx ingester when it encountered .docx files:

plugin.named.org.dspace.content.packager.PackageIngester = \
org.dspace.content.packager.PDFPackager  = Adobe PDF, PDF, \
org.dspace.content.packager.DSpaceMETSIngester = METS, \
org.dspace.content.packager.DSpaceDocxIngester = DOCX

I then configured the SWORD package to expose the fact that it supported .docx files in its SWORD service document:

sword.accept-packaging.Docx.identifier = application/vnd.openxmlformats-officedocument.wordprocessingml.document
sword.accept-packaging.Docx.q = 1.0

Finally the DSpace SWORD interface needed to know which packager to use for .docx files based on their MIME type:

plugin.named.org.dspace.sword.SWORDIngester = \
org.dspace.sword.SWORDMETSIngester = http://purl.org/net/sword-types/METSDSpaceSIP \
org.dspace.sword.SimpleFileIngester = SimpleFileIngester \
org.dspace.sword.DocxIngester = application/vnd.openxmlformats-officedocument.wordprocessingml.document

All that is needed to use this is a copy of the authoring add-in (http://research.microsoft.com/en-us/projects/authoring/), and a suitable formatted template for the repository that you wish to deposit the document into (dspace-swordapp-org.docx). The template is preconfigured to deposit directly into the DSpace SWORD demo repository which I have upgraded with the new code to accept .docx deposits. Feel free to create an account in that repository, install the add-in, load the template, and try out a deposit!

This complete end to end process allows you to create Word templates, and to mark them up with required and optional fields. It also allows you to embed details of the SWORD deposit repository URL (so the users do not need to know what it is) within the template for easy deposit. This could be used for example for a journal editor to provide a template and a deposit location for new paper submissions all-in-one. And this use case could be extended: for example if a faculty member wants all their students to submit an assignment with a template, they could do so and use the repository as the end point rather than a traditional VLE. And unlike a VLE, the repository will probably provide search and indexing facilities across the deposited documents. I’m sure as this tool gets used more, there will be a lot of new ideas for how it can be used.

Comments welcome! :)

Bookmark and Share
Posted on July 4, 2009 at 8:23 pm by Stuart · Permalink · 6 Comments
In: Uncategorized · Tagged with: , ,

SWORD PHP Library version 0.7 released

I have just released version 0.7 of the SWORD PHP library. It can be downloaded from http://php.swordapp.org/

This latest version adds two new features:

To show how easy it is to use the library, see the following code which requests a service document, creates a package, and then deposits it:

// Import the library
require('swordappclient.php');

// Create an instance of the client
$sac = new SWORDAPPClient();

// Request a service document
$sdr = $sac->servicedocument($url, $user, $password, $onbehalfof);

// Import the packager library
require('packager_mets_swap.php');

// Create a new package with the root and directory of the input files, and the root and directory of the package
$package = new PackagerMetsSwap($rootin, $dirin, $rootout, $fileout);

// Add metadata to the package
$package->setType($test_type);
$package->setTitle($title);
$package->setAbstract($abstract);
foreach ($creators as $creator) {
    $package->addCreator($creator);
}

// Add a file to the package
$package->addFile($filename, $mimetype);

// Now deposit the package
$dr = $sac->deposit($depositurl, $username, $password, $onbehalfof, $filename, $packageformat, $pacakgecontenttype);

Please send requests or leave a comment for features for the next version.

Bookmark and Share
Posted on June 23, 2009 at 9:03 pm by Stuart · Permalink · 2 Comments
In: Uncategorized · Tagged with: ,