Handling Results in Fedora’s REST API

Lately I’ve been working to put in more development time with the Fedora repository at the Goodwill Computer Museum.

A PHP ingest interface we’ve set up is certainly the most developed of the our repository’s services, but there’s a strong need to relate one object to another as it is being ingested. To do this I want to provide the user with a drop down menu of objects in the repository which fulfill some criteria (say, the object represents a donator or creator). The user can select one during the ingest phase, relating the ingested object to this other object. That relationship would be recorded in the RELS-EXT datastream as RDF/XML, creating a triple. The predicate of that triple will come from either Fedora’s own ontology [RDF schema] or another appropriate namespace.

Below is PHP code using the cURL client library to call Fedora’s REST API and get this list of relevant objects. I encountered a few stumbling blocks putting this together, so I thought I’d share in case others were curious or looking at a similar problem.

The first step is to compose your query, and then initiate a cURL session with the query.


<?php
$request = "http://your.address.domain:port/fedora/objects?query=yourQuery&resultFormat=xml";
$session = curl_init($request);

curl_setopt($session, CURLOPT_RETURNTRANSFER, true);

$response = curl_exec($session);
$responseResult = simplexml_load_string($response);
$resultsArray = array();

foreach ($responseResult->{'resultList'} as $result) {
     foreach ($result->{'objectFields'} as $entry) {
          foreach ($entry as $value) {
               $resultsArray[] = $value;
          }
     }
}
curl_close($session);

while (!empty($token)) {
     $nextQuery = "http://your.address.domain:port/fedora/objects?sessionToken=" . urlencode($token) . "&query=yourQuery&resultFormat=xml";
     $nextSession = curl_init($nextQuery);

     curl_setopt($nextSession, CURLOPT_RETURNTRANSFER, true);

     $nextResponse = curl_exec($nextSession);
     $nextResponseResult = simplexml_load_string($nextResponse);

     foreach ($nextResponseResult->{'resultList'} as $result) {
          foreach ($result->{'objectFields'} as $entry) {
               foreach ($entry as $value) {
                    $resultsArray[] = $value;
               }
          }
     $token = $nextResponseResult->{'listSession'}->{'token'};
     print "$token<br />\n";

     curl_close($nextSession);

} //while
?>

On line 2 I’ve specified my query results to be returned as XML and not HTML (resultFormat=xml). This is because I don’t want a simple browser view of the results — I want to work with them some first, so XML is appropriate.

On line 5 the cURL option CURLOPT_RETURNTRANSFER to ‘true’. This directs cURL to deliver the return of its Fedora query as a string return value to the curl_exec() variable, in this case $response.

On line 8 $response, now an XML structure, is loaded into $responseResult as a PHP5 object. The object is a tree structure containing arrays for the result list, the entries, and the entries’ value arrays, all of which we can work through to get to the record values of interest. The specific contents will depend on your query. You can get a good look at the object with print_r():

print_r($responseResult);

The two Fedora REST commands used are findObjects and resumeFindObjects. We need both of these commands because findObjects will not return more than 100 results, regardless the value you set on maxResults.

Instead it returns the results along with a token. This token is a long-ish string you can then supply to resumeFindObjects, which will continue retrieving your results for you. Just like findObjects, resumeFindObjects will never return more than 100 results, instead giving you another unique token. Once again, you can supply that token to a new resumeFindObjects command to continue getting your results.

The two loops for each of these commands should fill resultsArray[] with all the results available in the repository.

You can use this array in a HTML drop down:


<?php
echo "<select name=\"donators\">";
foreach ($responseResult->{'resultList'} as $result) {
	foreach ($result->{'objectFields'} as $entry) {
		$pid    = (string) $entry->pid;
		$title  = (string) $entry->title;
		echo "<option value=\"$pid\">$title</option>";
	}
}
echo "</select>";
?>

Keep in mind that values like $entry->pid and $entry->title are only going to be in the results if those fields have been requested in your queries.

This approach has given me a good understanding of calling and manipulating objects in Fedora through PHP. I have found that setting maxResults to a smaller number (say 5, 10, or 20) is faster than setting it to its maximum 100. And of course, if you are going to be fetching hundreds or thousands of objects, it’s best not to dump them all in a drop down or to fetch them all at once.

Puzzle Games for Software?

I just read Robert Patrick’s essay on eMuseums, hosted at Paul McJones’ excellent Dusty Decks blog. It’s a great read and addresses some of the problems of presenting computer history in an effective, and extensible, fashion.

I was specifically interested in Mr. Patrick’s thoughts on presenting software history. Hardware is a more intuitive museum subject in significant ways (its object-ness among them), but of course museums successfully convey subjects which have no direct corresponding object (e.g. touching the actual clothes of a Civil Rights victim) quite well. Still software remains especially difficult to present in an interesting way.

Mr. Patrick states that software’s workings are opaque to users, and suggests a multithreaded approach to software history that documents the different software types (applications, subroutines, operating systems, etc.) as they emerge, ascend or recede over time as separate threads.

Along with this, I am specifically interested in conveying to the museum goer the architecture, engineering, and writing of software. There is no better way to communicate the human labor, ingenuity, and yes, the toil, that goes into software making. Quoting industry numbers does not tell the museum goer that software is frequently an epic engineering project with considerable drama, not just externally (between departments, coders, and investors), but internally as well (in engineering problem solving). How to convey this drama?

Software is both an engineering and creative endeavor, and it exercises a rich figurative language that suggests physical play and work: variables are passed, object are created, something is trimmed, cleaned or scrubbed, a request is made, an exception is thrown, a thread stops and starts, etc.

I think this language indicates a way to illustrate the software engineering problem space abstracted away from specific commands and syntaxes. For example, museum visitors could manipulate some system (either a physical system or video game-type piece) with certain constraints emulating those of the coder. Come to think of it, puzzle games do a fine job of such demonstration already (perhaps more Portal than Braid). They could likely be much better demonstrations, of course, if they were directed toward this specific purpose.

I would love to see the day when some of the problems, solutions, tricks, etc. of software engineering are conveyed as well as those of medieval cathedrals or the Giza pyramids.

Migrating Data from MySQL

This is a simple PHP script to write data from a CSV file into a new FOXML file for ingest into the repository.  The process has been straightforward thus far. MySQL provides a quick way to export data from a table into a makeshift CSV file, and PHP’s fgetcsv function helps parse this file into a series of arrays which can be stepped through, value by value. This script has parsed the ~1000 rows of tabular data from the Hardware table into FOXML files. These files have been ingested into Fedora through the batch ingest tool available on the Java administration client.

Having this amount of records present in the repository has helped clarify how much functionality the base instance of Fedora possesses. The default search tool is fairly capable, and runs through several prominent Dublin Core fields specified in the FOXML file. Except for some notable absences, such as model number (for which there is no analogous DC term), the hardware records are as searchable and discoverable as they are on the MySQL/PHP interface. The major deficiency thus far is the lack of services associated with the RELS-EXT datastreams defined in the FOXML files. Associated these records with images and other parts is key, so this will continue to need to be developed.

Continue reading “Migrating Data from MySQL”