Skip to content
Walker Sampson

Walker Sampson

  • CV
  • ORCID Profile

Category: metadata

Digital Archives Day, or What I Do Here

Color Grid

Today is the second annual Day of Digital Archives, an effort to promote and explain what those of us working in or with digital archives are doing on a daily basis.

I wrestle with explaining this every time I’m asked what I do here. If I carry on too long I risk losing interest; too short or technical an explanation and my job remains obscure – not a good thing. I want my work, and the work in digital archives in general, to be widely understood and appreciated.

And so:

I work to ensure long-term and reliable access to the digital records of Mississippi government.

It’s perhaps wordy, but the best I’ve come up with – it describes what you get from my work (long-term access to digital public records of the state government) and the scope of the work without getting into the detailed reality of day-to-day tasks and projects.

So–what is that detailed reality? Here’s a rundown of what I’ve been doing lately:

• Working in Electronic Archives to manage a transition to DSpace for our digital repository. This entails metadata scrubbing (using Google Refine) and normalization to Dublin Core, along with writing import scripts, settling on organizational policies, and server administration.

As an example, we’ve just finished a routine to bundle relatively legacy MARC21 data – both a binary .dat file and XML – with the record in its new home. It’s great to retain this data in the event we overlook a piece of metadata in the migration.

I’m very excited about this project – DSpace will serve as a central store and get us closer to the services we would like to offer users when accessing digital material here.

• Processing born-digital records from state agencies and offices. This entails description, scanning for confidential data, format migrations, and generating an online access point for the material.

• Delivering electronic records training to state and local agency staff — I train both local government employees and state staff on best practices in managing their digital records.

• Managing the department’s Flickr account, where we share scans of our archival photos.

• Sharing duties on the Education Subcommittee of the State Electronic Records Initiative.

• Lately we have been looking into state agency web sites and social media to see what content is being made there that should be preserved. I hope to have more news on this in the near future.

It’s not an exhaustive list, but it’s some indication of daily work.

So this is the sort of thing I do. It’s a mix of technical work – stuff like Unix, Python and Java – and applying best practice in the records and archives fields to the unique circumstances here. It’s a good field to be in – lots of creative problem solving, lots of new technologies and tools in development, and you get to work with lots of smart, earnest people — always a huge plus.

WXS archives, metadata, preservation, records, repository Leave a comment October 12, 2012October 12, 2012 2 Minutes

On Flickr

Cora Mae Martin
Cora Mae Martin. Credit: Mississippi Department of Archives & History

Mississippi Department of Archives and History on Flickr

I’m happy to post that the Mississippi Department of Archives and History now has a Flickr page for our archival material. This is in addition to the Digital Archives we host already, along with numerous other scans scattered about in the catalog which are not exhibited.

I’m optimistic that Flickr will add something important to our online presentations. Along with user feedback in the form of comments and tags, Flickr allows us to more quickly highlight and share material not already exhibited or which exists as a single item outside of a collection. We also have our eye on joining The Commons at Flickr once we’ve managed the account for a while.

Some Thoughts on Flickr

So, it’s been a while since Flickr was the new hotness. Instagram, Pinterest, Facebook, Twitter and a handful of other platforms have established themselves as the preferred way for individuals to share photos. There are as well a few articles describing Yahoo’s mismanagement and costly misunderstanding of Flickr’s value and purpose.

(And yes, Flickr missed a few boats – for instance, amplifying its social network. Check out the vestigial Singleness option for you on your profile: Single, Taken, Open, ‘Rather Not Say’ (distinct from simply not filling in the options at all, of course). Not sorry to see this one go by.)

I remain convinced however that there is simply no better social media platform for a cultural institute to share their photos on than Flickr. Despite some rough years, Flickr still offers the very best space for showcasing this type of material.

  • It gives the photos adequate space for descriptive and technical metadata.
  • It manages and displays high-resolution photos very well.
  • Its grouping mechanism of sets and collections aligns well the archives, museums and libraries.
  • Built-in support for Creative Commons licenses and an appropriate license for archival material – No known copyright restrictions.
  • Again, The Commons.

And there has been an uptick in activity from the Flickr camp of late – a splendid uploader and organizer built on HTML5 being two of them. Flickr still has immense value.

I am especially interested to see how user contributions turn out. This has been a subject that cultural institutes on Flickr have discussed before – see this post by Larry Cebula and the discussion on Flickr generated from it. The issue discussed in those links is how valuable the user contributions are  — given the signal-to-noise ratio of great contributions to unhelpful contributions.

I can’t help but feel that Flickr could benefit from a filtering or ranking system that elevates and highlights valuable comments and lowers or hides less valuable or incorrect contributions — a solution suggested in the aforementioned Flickr thread. Wikipedia does this through editing. Reddit does this through voting. Stack Exchange does this through voting and a point-based reputation system linked to site privileges. All potentially valid ways of emphasizing the good over the not-so-great. Flickr could provide purpose and direction to its social network and the resulting content through systems like these (and finally get the confidence to drop the ‘Singleness’ option on its profile pages).

There are naturally any number of wonderful contributions, and any number of trivial or silly ones. It’s just that ratio that is the deciding factor. As I say, I’m interested and optimistic that we can get a good community going, and I’m really looking forward to more engagement with patrons and interested persons through the platform.

WXS archives, digital media, metadata Leave a comment June 19, 2012 2 Minutes

Format Obsolescence: Maybe Not Such a Bugaboo

I’ve been reading a series of posts by David Rosenthal on his blog analyzing the issue of format obsolescence. Traditionally, and at least in my education, format obsolescence has been treated as one of the great bugaboos of digital preservation. In response, a number of tools and resources have been developed focusing on format identification and validation (DROID, JHOVE, FITS, PRONOM and the upcoming UDFR to name a few prominent ones). Looking at the preservation landscape, it’s clear that format sustainability has been forefront in the collective effort.

Rosenthal however makes a convincing argument that this placement of effort is misguided, and is not providing the best ROI for the digital preservation community. I won’t repeat his arguments, except to say that Rosenthal places the format obsolescence issue in a historical context that suggests much has changed in computing since, and indicates other areas much overlooked (bit fixity, storage costs and hardware quality) that are shaping up as problematic indeed. Here’s a starter to his posts:

  • Format Obsolescence: Scenarios – April 27, 2007

  • Format Obsolescence: The Prostrate Cancer of Preservation – May 7, 2007

  • Format Obsolescence: Right Here Right Now? – January 3, 2008

  • Are Format Specifications Important for Preservation? – January 4, 2009

  • Postel’s Law – January 15, 2009

That should get one started although there are many, many posts on the subject. Given those dates, I’m pretty late to the party, but I feel this is required reading for digital preservationists, agreement or no aside.

After a few reads you may be running for the nearest self-healing, mirrored ZFS volume, waking up in cold sweats and mumbling on about silent data corruption. Scary.

WXS archives, digital media, metadata, preservation, repository Leave a comment July 18, 2011September 13, 2012 1 Minute

MITH’s Vintage Computers is Up!

MITH's Vintage Computers

The project I worked on at MITH is up, MITH’s Vintage Computers. It’s a catalog and archive of the vintage computer systems and equipment at the center.

The goal is to demonstrate an intuitive and useful way to browse vintage systems. These are complex artifacts with numerous component pieces and software, each of various versions and iterations, with a wealth of important properties and functions, and all from many manufacturers and designers. No model yet exists for conveying this complexity in metadata records. Hopefully the site suggests the utility of such an endeavor, and is at least a little fun to browse.

I had an immense amount of help and support from all the folks at MITH, who shepherded the project start to finish. I can’t thank them enough for their support and expertise. But still: thanks everyone!

WXS metadata, MITH, repository Leave a comment September 10, 2010 1 Minute

SNAC: The Social Networks and Archival Context Project

A post over at Inkdroid highlights the SNAC project, an effort to uncover and formalize person and agent data that is typically mixed into online archive records. A new standard is associated with the work, but even better is the mention of new techniques to pull this data out of records, regardless of the record’s disposition to the standard. Of course as the Inkdroid post points out microformats could go a long way to easing that process.

I really like these sorts of efforts. Agent information and contact profiles are fast becoming first-class entities on the web (see Mozilla Labs experimental Contacts add-on). Particularly for contact profiles, there may be backlash against formalizing such data as a resource, perhaps as impinging on the ideal of anonymous cyberspace so praised during the Internet’s early ascendance. It seems however that unless this data is formally handled and dealt with, privacy and control issues will continue to plague its use.

WXS metadata, repository Leave a comment August 24, 2010 1 Minute

Relations, Relations

Work with Omeka soldiers on. As I write more PHP to handle the relational metadata on display for the site, I continue to notice some of constraints of my present setup:

  1. The code specifying the functionality for recognition and categorization of relationship metadata (‘Is Part Of’, ‘Has Part’, ‘Connections’ and now ‘Sister Part’) is inside the custom theme. This is breaking the principle of content and display as separate operations that can mix and match. Thus if you take away the custom theme, none of the relationship functionality is present. This was not a strict division in the first place, but that’s even less so now.
  2. Omeka does not provide a way to formally link objects together beyond grouping them into collections. While the Extended Dublin Core plugin provides useful dcterms like ‘hasPart’ and ‘isPartOf’, and while they can be expressed in RDF/XML, the Omeka system itself does not provide a way to hinge functionality upon such relationships. In fact the only way an item’s associated components are displayed is through some string manipulation of the item’s ‘isPartOf’ or ‘hasPart’ values. I cannot just ask Omeka for the related items.

Despite these constraints, I am not disappointed with the setup. I think Omeka’s presentation strength is perfectly suited to this work and to my timeframe. This particular project is about presenting a model that can be critiqued and shared. To that effect, the Turing test can (more or less) work here: if it operates now as fully functional, and appears semantic and scalable, then it has achieved the purpose. We don’t ask, “Is it extensible, interoperable, and semantic?” We just ask, “Does it work that way? Would such a system be worthwhile?” There is nothing on display now (on the local machine, that is) which could not be easily done elsewhere in a more robust, scalable way if it were deemed important enough to do it. I am thinking particularly of the Fedora repository, which I have worked with at the Goodwill Computer Museum, as well-suited to the sort of lateral, semantic relationships this model would provide.

Matt noted this sort of modeling as potentially useful for platform studies, and I imagine an entire linked database of such modeling, where one could index any number of machines by any number of properties and components, would help digital humanists examine machines or platforms across an array of creators and timespans, and would facilitate some easier groupings and associations.

In any case – going back, the first constraint is a matter of abstracting the code enough so that it can function as a plugin. Such an effort is considerable because the plugin would need to achieve the logical linking of items that the present implementation does not. My first instinct would be to directly manipulate the MySQL Omeka database, and insert foreign keys and such. And while doing this, one would want to keep the configurations flexible enough so that the plugin is of use to other Omeka users who need to define relationships in an entirely different way. It’s an interesting project, but also beyond the scope of this one I think.

Moving on, this week I’ve added a ‘Sister Parts’ display which shows components that are part of the current item’s container. The upside is more information on display, and faster browsing. The downside is more information on display (read: clutter!).

Beyond this, we are presently working on getting an instance on a live server for some broader access and viewing. Keep your fingers crossed.

WXS digital media, metadata, MITH, repository Leave a comment July 22, 2010July 22, 2010 2 Minutes

Vintage Computing Equipment, Who Will Have You?

CATALOG Listing for an Apple DOS 3.3 Disk
CATALOG Listing for an Apple DOS 3.3 Disk

As an exploration in creating a limit case for computing equipment metadata, I have been documented Matt’s Apple IIe system to the fullest extent possible. The idea is to observe how much metadata can be reasonably gathered. From this body of data we may better understand what kind of metadata an institute would want on hand about the systems they curate or archive. It’s been fun.

By the way, the Wikipedia article on the Apple IIe erroneously lists ProDOS as the only OS for the system. Later versions shipped with that OS, but the unit originally shipped with Apple DOS 3.3, and the unit here is one such system. So I’ve changed that. Always good to remind myself that Wikipedia is not authoritative (and a little scary, given how much I use it).

This week I’ve also been examining Omeka. I think Omeka will be well-suited to the task of conveying MITH’s vintage computing resources in a compelling way for the general public, while also accommodating more technical or administrative metadata for use in-house. Omeka as a CMS has some very well developed strengths, and most are geared toward the presentation of museum objects as opposed to archival management of objects. I hope future development on the platform will look into easier data portability and interoperability with systems besides other Omeka instances.

For example, the Dublin Core Extended plugin is great, and as a bonus adds the ability to output an item’s Dublin Core record in RDF/XML. I have put in some trial PHP code to link to a display of this output, along with DCMES/XML and Omeka/XML. This is with the idea that a user may want to take this data and plug it in somewhere else.

A few impediments to this proposition exist right now. One is that other element sets which may be in use on the Omeka instance (such as CDWA Lite, upcoming VRA Core, etc.) will not show up in those DC outputs, though they will be present in Omeka/XML. It would be useful to export an item’s metadata, across all its element sets, to RDF/XML, since RDF is the functioning backbone of the Semantic Web and of Linked Data.

Second, HTML is allowed to fill a DC value. The developers have covered this feature/bug before, and it’s stated that the decision stems from the project’s emphasis on presentation. One could strip these tags with PHP’s strip_tags(), but you wouldn’t be left with a URI suitable for use in the Semantic Web necessarily. I’m keeping my eye on Patrick Murray-John’s Linked Open Omeka plugin in development, which tries to address these sorts of issues.

In any case, this is just an initial exploration of what Omeka does and doesn’t do, to understand the limits of the system. It’s a bit fine-grained for this point in the project.

Opened Apple IIe
Opened Apple IIe

So to pull it back a little, the real consideration at this point is development of relevant metadata for computing equipment. To my knowledge, no element set exists for this. There are certainly a number of ontologies describing a computer. On that note, one ontology finds computers under “Office Equipment.” Interestingly, the label at the bottom of the Apple IIe also states “Office Equipment.” Perhaps this is an industry scheme then? First guess is an FCC categorization, since I can’t imagine Apple self-applying the term.

In any case, these ontologies aren’t really geared toward either archival institutes which may use these computers for data recovery and access (and research), or museums which may want to describe them as cultural artifacts. So while the DC element set is suitable as a base, we’re open to other element sets with hierarchies (e.g. MODS) or a freeform exploratory generation of a new set. I really don’t think the need for such a set is going to go away soon for archives, as more and more find themselves taking in computing equipment as part of a fonds or as office equipment (ha) for the archive’s legacy media materials.

WXS archives, metadata, MITH, preservation Leave a comment July 1, 2010August 3, 2011 3 Minutes
Blog at WordPress.com.
  • Follow Following
    • Walker Sampson
    • Join 38 other followers
    • Already have a WordPress.com account? Log in now.
    • Walker Sampson
    • Customize
    • Follow Following
    • Sign up
    • Log in
    • Report this content
    • View site in Reader
    • Manage subscriptions
    • Collapse this bar
 

Loading Comments...