Handling Results in Fedora’s REST API

Lately I’ve been working to put in more development time with the Fedora repository at the Goodwill Computer Museum.

A PHP ingest interface we’ve set up is certainly the most developed of the our repository’s services, but there’s a strong need to relate one object to another as it is being ingested. To do this I want to provide the user with a drop down menu of objects in the repository which fulfill some criteria (say, the object represents a donator or creator). The user can select one during the ingest phase, relating the ingested object to this other object. That relationship would be recorded in the RELS-EXT datastream as RDF/XML, creating a triple. The predicate of that triple will come from either Fedora’s own ontology [RDF schema] or another appropriate namespace.

Below is PHP code using the cURL client library to call Fedora’s REST API and get this list of relevant objects. I encountered a few stumbling blocks putting this together, so I thought I’d share in case others were curious or looking at a similar problem.

The first step is to compose your query, and then initiate a cURL session with the query.


<?php
$request = "http://your.address.domain:port/fedora/objects?query=yourQuery&resultFormat=xml";
$session = curl_init($request);

curl_setopt($session, CURLOPT_RETURNTRANSFER, true);

$response = curl_exec($session);
$responseResult = simplexml_load_string($response);
$resultsArray = array();

foreach ($responseResult->{'resultList'} as $result) {
     foreach ($result->{'objectFields'} as $entry) {
          foreach ($entry as $value) {
               $resultsArray[] = $value;
          }
     }
}
curl_close($session);

while (!empty($token)) {
     $nextQuery = "http://your.address.domain:port/fedora/objects?sessionToken=" . urlencode($token) . "&query=yourQuery&resultFormat=xml";
     $nextSession = curl_init($nextQuery);

     curl_setopt($nextSession, CURLOPT_RETURNTRANSFER, true);

     $nextResponse = curl_exec($nextSession);
     $nextResponseResult = simplexml_load_string($nextResponse);

     foreach ($nextResponseResult->{'resultList'} as $result) {
          foreach ($result->{'objectFields'} as $entry) {
               foreach ($entry as $value) {
                    $resultsArray[] = $value;
               }
          }
     $token = $nextResponseResult->{'listSession'}->{'token'};
     print "$token<br />\n";

     curl_close($nextSession);

} //while
?>

On line 2 I’ve specified my query results to be returned as XML and not HTML (resultFormat=xml). This is because I don’t want a simple browser view of the results — I want to work with them some first, so XML is appropriate.

On line 5 the cURL option CURLOPT_RETURNTRANSFER to ‘true’. This directs cURL to deliver the return of its Fedora query as a string return value to the curl_exec() variable, in this case $response.

On line 8 $response, now an XML structure, is loaded into $responseResult as a PHP5 object. The object is a tree structure containing arrays for the result list, the entries, and the entries’ value arrays, all of which we can work through to get to the record values of interest. The specific contents will depend on your query. You can get a good look at the object with print_r():

print_r($responseResult);

The two Fedora REST commands used are findObjects and resumeFindObjects. We need both of these commands because findObjects will not return more than 100 results, regardless the value you set on maxResults.

Instead it returns the results along with a token. This token is a long-ish string you can then supply to resumeFindObjects, which will continue retrieving your results for you. Just like findObjects, resumeFindObjects will never return more than 100 results, instead giving you another unique token. Once again, you can supply that token to a new resumeFindObjects command to continue getting your results.

The two loops for each of these commands should fill resultsArray[] with all the results available in the repository.

You can use this array in a HTML drop down:


<?php
echo "<select name=\"donators\">";
foreach ($responseResult->{'resultList'} as $result) {
	foreach ($result->{'objectFields'} as $entry) {
		$pid    = (string) $entry->pid;
		$title  = (string) $entry->title;
		echo "<option value=\"$pid\">$title</option>";
	}
}
echo "</select>";
?>

Keep in mind that values like $entry->pid and $entry->title are only going to be in the results if those fields have been requested in your queries.

This approach has given me a good understanding of calling and manipulating objects in Fedora through PHP. I have found that setting maxResults to a smaller number (say 5, 10, or 20) is faster than setting it to its maximum 100. And of course, if you are going to be fetching hundreds or thousands of objects, it’s best not to dump them all in a drop down or to fetch them all at once.

Migrating Data from MySQL

This is a simple PHP script to write data from a CSV file into a new FOXML file for ingest into the repository.  The process has been straightforward thus far. MySQL provides a quick way to export data from a table into a makeshift CSV file, and PHP’s fgetcsv function helps parse this file into a series of arrays which can be stepped through, value by value. This script has parsed the ~1000 rows of tabular data from the Hardware table into FOXML files. These files have been ingested into Fedora through the batch ingest tool available on the Java administration client.

Having this amount of records present in the repository has helped clarify how much functionality the base instance of Fedora possesses. The default search tool is fairly capable, and runs through several prominent Dublin Core fields specified in the FOXML file. Except for some notable absences, such as model number (for which there is no analogous DC term), the hardware records are as searchable and discoverable as they are on the MySQL/PHP interface. The major deficiency thus far is the lack of services associated with the RELS-EXT datastreams defined in the FOXML files. Associated these records with images and other parts is key, so this will continue to need to be developed.

Continue reading “Migrating Data from MySQL”

EasyDeposit Work

EasyDeposit looks more and more promising as a straightforward way to manage the deposit interface for Fedora. I have written content models for the main categories of materials type (hardware, software, etc.). These content model objects also contain the

<rel:isCollection>true</rel:isCollection>

attribute elaborated on here, so they can function as “collections” in Fedora. Submitting test objects into these collections correctly gives them a RELS-EXT datastream stating they are a member of this collection, i.e.:

<rel:isMemberOf rdf:resource="gcm-cModel:hardware" />.

This is a good start I hope.

From this point two key issues stand out. The first is that EasyDeposit defaults to zipping submissions. This is appropriate when the submission is the actual data meant to be preserved, as is the case with a software submission, but is not appropriate when the submission is simply metadata. In this case a simple FOXML file is all that needs to be submitted. I am sure this can be changed somewhere.

The second is that a key feature of the repository is mapping relationships between pieces of hardware and software. Thus, a submitter needs to be able to specify such a relationship, ideally in the submission process. There needs to be a step where the user can pick from a list of relationships and previously submitted target resources. For example, some record isPartOf of x and y or hasPart a and b. Generating a list of possible relationships should be straightforward, but it is likely the user will simply have to specify manually what the target resource is.

Using SWORD for Deposit

I’ve been looking for suitable front-ends for Fedora. The default installation comes with both a Java administrator client and a newer (though less full-featured) web administration tool. Developers are currently trying to transition over to the web client, adding functionality with each point release. Neither of these clients are suitable as comprehensive front-ends for Fedora however.

I identified a deposit interface as the foremost component for the front-end, and quickly went looking to the SWORD APP for a solution. SWORD (Simple web-service offering repository deposit) is a profile of the Atom Publishing Protocol (APP), a successful syndication specification for items on the web. SWORD retools the protocol a tiny bit and emphasizes the request and deposit functions over functions like delete or update. APP is a HTTP-based protocol, so of course SWORD functions over the web. The idea is that a simple protocol like this will find widespread use and acceptance, bolstering the awareness and bulk of repositories by simplifying the deposit process, all while keeping it as a remote function. For instance, there’s SWORD Facebook app and a (sample) SWORD plug-in for Microsoft Office. Ideally from either of these platforms you can send your work right away to any number of receiving repositories.

I have setup SWORD with Fedora. It runs as a web application inside Tomcat. After a few kinks everything seems to be ironed out, and it’s a little more clear how SWORD could fit into the repository as a whole. SWORD is best at depositing content, naturally, and it’s best if that content is pre-processed. The out-of-the-box demonstration client isn’t going to provide a way add metadata to your deposit.

EasyDeposit, written by Stuart Lewis, looks very promising in this area. EasyDeposit is a PHP front-end to SWORD, and allows the user to go through several steps prior to delivering the deposit. It allows the administrator to adjust, add, delete, and create steps, tailoring the process to the organization. For our purposes, we would add steps to receive metadata information about hardware, bibliographic materials, software and so on. I’ve implemented EasyDeposit on our test instance and it does safely deposit content along with some default metadata fields. Concerning the steps already present, configuration is straightforward, requiring modification of .php files. It should be easy to add in collections, or content models that define metadata and behavior requirements, and present them in these steps. Unfortunately (for me), EasyDeposit is presently at 0.1 (although it just got support for the CrossRef API), and documentation is not all there. This doesn’t put EasyDeposit completely out of the running per se, as the interface looks great, and the idea of customizing a template is pretty attractive.

More broadly, SWORD will not serve as an entire front-end solution, and the question becomes whether one wants to join a SWORD interface like EasyDeposit with other Fedora-compliant components like searching and disseminating. Alternatives to this approach are projects like Muradora and Islandora that attempt to provide more full-featured front-end. I plan to explore these options and get a better idea of the possibilities for full implementation.

Ingesting a Content Model

I wanted to briefly post a Content Model for the repository. This is the “Common Metadata” Content Model, and it has been adapted from the Hydra Project. A digital object can be associated with a content model before, during or after ingestion. The content model in turn can point to a Service Definition object, which defines certain services for the digital object. Those service are in turn specifically defined in a Service Deployment object. The SDep object defines these services with the Web Services Description Language (WSDL). WSDL is an XML format for specifically detailing a web service. Documentation for Fedora 3 states that “Notably, Fedora currently only supports performing disseminations via HTTP GET.” This should be fine for our purposes, and it should make our .wsdl file pretty straightforward.

This is certainly a lot of layers involved in making Fedora really do something, but the modularity is key when you need to make changes on your server. If the port designations for your server have changed, you only need to change a few .wsdl files. If you need to assign new services to a new type of digital object, just add those service definitions to your content model. At least, I hope it’s that simple.

<?xml version="1.0" encoding="UTF-8"?>
<foxml:digitalObject VERSION="1.1" PID="gcm-cModel:commonMetadata"
xmlns:foxml="info:fedora/fedora-system:def/foxml#"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="info:fedora/fedora-system:def/foxml# http://www.fedora.info/definitions/1/0/foxml1-1.xsd">

<foxml:objectProperties>
<foxml:property NAME="info:fedora/fedora-system:def/model#state" VALUE="Active"/>
<foxml:property NAME="info:fedora/fedora-system:def/model#label" VALUE="Common metadata model"/>
<foxml:property NAME="info:fedora/fedora-system:def/model#ownerId" VALUE="fedoraAdmin"/>
</foxml:objectProperties>

<foxml:datastream ID="RELS-EXT" STATE="A" CONTROL_GROUP="X" VERSIONABLE="true">
<foxml:datastreamVersion ID="RELS-EXT.0" LABEL="External relations" MIMETYPE="text/xml" SIZE="448">
<foxml:xmlContent>
<rdf:RDF xmlns:fedora-model="info:fedora/fedora-system:def/model#" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
  <rdf:Description rdf:about="info:fedora/gcm-cModel:commonMetadata">
    <fedora-model:hasModel rdf:resource="info:fedora/fedora-system:ContentModel-3.0"></fedora-model:hasModel>

    <!--
    A key line here. This says that objects associated with this Content Model will have this service. In this case, since this is a "Common Metadata" Content Model, the service it points to is "gcm-sDef:commonMetadata."
    -->

        <fedora-model:hasService rdf:resource="info:fedora/gcm-sDef:commonMetadata"></fedora-model:hasService>

    </rdf:Description>
</rdf:RDF>
</foxml:xmlContent>
</foxml:datastreamVersion>
</foxml:datastream>

<foxml:datastream ID="DS-COMPOSITE-MODEL" STATE="A" CONTROL_GROUP="X" VERSIONABLE="true">
<foxml:datastreamVersion ID="DS-COMPOSITE-MODEL.0" LABEL="DS composite model" MIMETYPE="text/xml" SIZE="780">
<foxml:xmlContent>
<dsCompositeModel xmlns="info:fedora/fedora-system:def/dsCompositeModel#">
  <dsTypeModel ID="DC">
    <form MIME="text/xml"></form>
  </dsTypeModel>
  <dsTypeModel ID="RELS-EXT">
    <form MIME="text/xml"></form>
  </dsTypeModel>
  <dsTypeModel ID="descMetadata">
    <form MIME="text/xml"></form>
  </dsTypeModel>
</dsCompositeModel>
</foxml:xmlContent>
</foxml:datastreamVersion>
</foxml:datastream>

</foxml:digitalObject>

This content model points to one service call (info:fedora/gcm-sDef:commonMetadata). This service definition object will (through a SDep) deliver metadata common to all the objects in the repository: title, type (from the DCMI vocabulary), type (from our own categories), creation date and general description.

Finally, this content model is missing two compulsory datastreams, a basic Dublin Core datastream and an auditing one. These are added by Fedora automatically when the object is ingested.

Instance of Fedora 3.3 Up

Whew. That was intense.

A bare bones installation of Fedora 3.3 is up and running on the GCM server. Address is http://10.10.24.35:8080/fedora if you’re in Goodwill’s network, authentication required. Installation was actually relatively painless, and took about 1/30 the time I spent setting up a default install of DSpace last year. I’m not sure why this is, DSpace is suppose to be the more plug and playable repository software. A key difference here: there is absolutely nothing you can do with Fedora right now; with DSpace you could start building communities and collections from the start.

Fedora is using our regular old MySQL instance to store its information. This will work fine. The next step will be to set up some basic services and become familiar with FOXML, the XML format Fedora uses to describe its digital objects. From there we can begin creating our schema and doing some simple test runs. When this is going smoothly enough, we can think more about putting in hooks into the Semantic Web (such as it stands) by pointing to elements in Dublin Core, Friend of a Friend, or other ontologies.

One final prerequisite that does need to get up as soon as possible is regular backup of the server. Our present MySQL database is mailed weekly to a couple different computers, so that data is fairly secure. But the drive running our server has no redundancy, so we’re not as secure as we should be.