I have a paper out this month in the American Archivist with my friend and former UT Austin colleague Tim Arnold. The paper centers on best practices for collecting and preserving a collection of tweets, and looks specifically at a collection culled during the protests in Tahrir Square in early 2011. We dig into the difficulties of scoping search terms and users (in the context of the Egyptian Revolution of 2011 and more generally), the constraints of the Twitter API, and how to contextualize the harvesting of thousands of tweets through that API.
Many thanks to the original researchers for collecting the data and to the American Archivist for their interest in the paper.
Preserving the Voices of Revolution: Examining the Creation and Preservation of a Subject-Centered Collection of Tweets from the Eighteen Days in Egypt (SAA) (CU Scholar)
Two files with different passages of 1s and 0s automatically have different checksums but may still offer the same experience; for example, two copies of a digitized film may differ by a few frames but look identical to the human eye. The point of digitizing a Stanley Kubrick film isn’t to create a new mathematical artifact with its own unchanging properties, but to capture for future generations the experience us old timers had of watching his cinematic genius in celluloid. As a custodian of culture, my job isn’t to ensure my DVD of A Clockwork Orange is faithful to some technician’s choices when digitizing the film; it’s to ensure it’s faithful to Kubrick’s choices as a filmmaker.
As in nearly all storage-based solutions, fixity does little to help capture context. We can run checksums on the Riverside “King Lear” till the cows come home, and it still won’t tell us that boys played women’s parts, or that Elizabethan actors spoke with rounded vowels that sound more like a contemporary American accent than the King’s English, or how each generation of performers has drawn on the previous for inspiration. Even on a manuscript level, a checksum will only validate one of many variations of a text that was in reality constantly mutating and evolving.
In my own preoccupation with disk imaging, generating checksums and storing them on servers, I forget that at best this is the very beginning of preservation; not an incontestable “ground truth” of the artifact.
I just finished reading Anna Anthropy’s ZZT, from the Boss Fight Books series. While I have a few issues with the book, I was really happy with her work and felt that it struck a great balance between personal narrative and game history.
On the latter, I’m especially happy that the author has taken pains to convey the culture surrounding the ZZT game and its creation tools. There are two reasons for this which tie well into game preservation.
First, it’s a prime example of discussing games and game development outside the context of entertainment. I previously linked to Jaroslav Švelch’s article in Game Studies, “Say it with a Computer Game”. Anna Anthropy’s book demonstrates how a game (in this case ZZT and the games made from its toolkit) facilitated groups, rivalries, skill demonstrations, personal expression, cultural commentary, and so on.
It’s also a great example of looking beyond gameplay as the final result of game preservation. I recently attended the Born Digital and Cultural Heritage conference in Melbourne, put on by the Play It Again group there. In his keynote Henry Lowood emphasized looking to end products of the preservation process beyond playing the game, such as recordings of play, narratives of play, the cultural materials surrounding the game, etc. ZZT preserves some of the experience of play, and of being enmeshed in that culture, through a wonderful preservation technology that goes criminally under-emphasized: writing.
(As a further example, if you download Stanford’s DOOM collection you’ll have the shareware copy of the game, but along with that you’ll find a wealth of artifacts surrounding the game: .WAD collections, web pages and fan sites, articles, reviews, forum user threads, and the like (and I will add, many copies of beta and alpha versions of the seminal shooter, which I have argued before ought to be a key priority for game archives). It’s an excellent resource and any researcher would want to move through this collection as a way to understand the game and some of its critical context.)
The focus on the “pragmatics” of digital game production can help us broaden the range of analogies game studies is working with. Games can be understood as more than just entertainment products or art pieces.
I’m often asked – in the course of my job or by an acquaintance – to explain ‘digital preservation’ and what I mean by it. And as I’m sure others in this field know, a frequent first guess is scanning – you’re scanning stuff, right?
It’s a reasonable and valid guess – digitization can and is used as a preservation strategy – but it’s a reply that leaves me stumbling, “Yes, but…” as it’s the born-digital content that is most likely to be overlooked for a newcomer.
I’m often tongue tied though to explain why born-digital material is important at a personal level for an individual. To some it seems immediately frivolous – perhaps resulting from a notion that the digital enterprise is inherently ephemeral, or that the ‘information superhighway’ – a dated term but one still with a legacy – is just a media-carrying superstructure over the real stuff.
Not having someone immediately agree with your assumptions startles you into explanation mode. So I reach for a personal example of born-digital vitality. But the truth is that in my recent past I’ve done a pretty good job of preserving the digital materials that are important to me. Setting up a reasonably safe (and this is key: automated) backup routine and checking media health every once in a while goes a long way. So I have no woeful narrative to relate there about personal digital material becoming lost (yet).
So I searched back through my own personal history to think of what born-digital content I have lost to time. Not just any old content that happened to be lost, but something that means a lot to me but is simply no more.
Now I’ve visited a near-loss and partial recovery with a high school art web site, so I recall here a complete content loss. Nothing remains but the recollection. This loss still smarts today – the code for my QBasic games. Hear my tale of woe, as I recreate here whatever will be left of those projects.
My kingdom for some GOTO code
When my family first purchased a computer, it took a few years for me to learn the ropes on it. I recall some unintended directory deletions while I was learning DOS, and at one point I thought I had truly broken the system through one of these errant deletes. The incident was only a mistakenly relocated set of files that broke a start-up routine, but it was not without its moments of vertigo that I had broken the family machine.
Eventually I got to understand command line customs, along with the basics of programming in the QBasic IDE, which came standard with MS-DOS and Windows for approximately nine years. Once I got the hang of basic user input and variable handling, I figured it was time to make games in QBasic.
Ah, to be young and just dive in! None of them were ever completed, though this does not bother me. I still believe just diving in is a handy practice.
Lend an ear and I’ll tell you about them.
The first effort was a fantastical text adventure with ANSI-style art inspired by the psychedelic landscapes of Kingdom of Kroz and Epic Megagames’ ZZT, but featuring the simple rules of a Choose your own Adventure novel. I got pretty far along before the tedium of hand drawing scenes row by row with the extended character set wore me down. I was still learning a lot.
The second game was identical in form, but took some less tasteful tones from Bethesda’s The Terminator title – an early stab for that studio at their now famous open-world design – as well as the Drugwars DOS game. I got even less far along than even the first game – just a couple of sequences before the player was abruptly dumped back into the sharp blue of QBasic’s IDE. I recall becoming bored and directionless at the monotone grimness the setting required, as well as the tedious, screen by screen gameplay.
The third game, and the most involved, was an RPG collaboration with an elementary school friend, very much modeled after the BBS classic Legend of the Red Dragon – but a single player affair. We had races, classes, a town, shops, NPCs, and had begun modeling the wilderness areas where the player would encounter whatever had to be fought there. However, school hedged in and the friend moved away, and our work stopped there.
I would give my right arm for the source code to any of these projects, but that last one hurts the most. My friend and I spent many hours and long nights developing the RPG – and never got very far – but this piece of digital content represents a huge investment of my enthusiasm and passion at that time. That it is utterly lost is painful. I don’t know what I could have done to have had the foresight to keep it, except to have kept the floppies around somehow by neglect. If this were a project nowadays, perhaps a forgotten email attachment could have wrought it up from the bog. Alas, at that time the only network we had was carting floppies between our houses.
There are other losses, such as my old MySpace page, which captures some of my disposition and contacts in the early college years, an embarrassing old fan site for a band I loved in high school, a lost DOOM level .wad – but the absence of this QBasic code hits strongest. This is simply how things get lost, alas – though I sigh wistfully when hearing of old game code being discovered. That someone, amazingly, has managed to create a modern game coded entirely in QBasic just makes me all the more wistful.
Citizens of tomorrow, your digital content – even if, like myself, you are not a heavy user of social media – can be profoundly important to you and very likely to others. Keep an eye on it, as I wish I had.
Just a little post to say I’ll be speaking at the NAGARA e-Records Conference this year in Austin, Texas. I’ll be describing the efforts of CoSA’s State Electronic Records Initiative (SERI) over the past few years – specifically our educative efforts, and the upcoming electronic records training workshops this year and next. These workshops will collectively be attended by every state and territorial archives and records program in the country.
Today is the second annual Day of Digital Archives, an effort to promote and explain what those of us working in or with digital archives are doing on a daily basis.
I wrestle with explaining this every time I’m asked what I do here. If I carry on too long I risk losing interest; too short or technical an explanation and my job remains obscure – not a good thing. I want my work, and the work in digital archives in general, to be widely understood and appreciated.
I work to ensure long-term and reliable access to the digital records of Mississippi government.
It’s perhaps wordy, but the best I’ve come up with – it describes what you get from my work (long-term access to digital public records of the state government) and the scope of the work without getting into the detailed reality of day-to-day tasks and projects.
So–what is that detailed reality? Here’s a rundown of what I’ve been doing lately:
• Working in Electronic Archives to manage a transition to DSpace for our digital repository. This entails metadata scrubbing (using Google Refine) and normalization to Dublin Core, along with writing import scripts, settling on organizational policies, and server administration.
As an example, we’ve just finished a routine to bundle relatively legacy MARC21 data – both a binary .dat file and XML – with the record in its new home. It’s great to retain this data in the event we overlook a piece of metadata in the migration.
I’m very excited about this project – DSpace will serve as a central store and get us closer to the services we would like to offer users when accessing digital material here.
• Processing born-digital records from state agencies and offices. This entails description, scanning for confidential data, format migrations, and generating an online access point for the material.
• Delivering electronic records training to state and local agency staff — I train both local government employees and state staff on best practices in managing their digital records.
• Managing the department’s Flickr account, where we share scans of our archival photos.
• Lately we have been looking into state agency web sites and social media to see what content is being made there that should be preserved. I hope to have more news on this in the near future.
It’s not an exhaustive list, but it’s some indication of daily work.
So this is the sort of thing I do. It’s a mix of technical work – stuff like Unix, Python and Java – and applying best practice in the records and archives fields to the unique circumstances here. It’s a good field to be in – lots of creative problem solving, lots of new technologies and tools in development, and you get to work with lots of smart, earnest people — always a huge plus.
I’ve recently committed to the Digital Preservation Q&A proposal at StackExchange. This is a resource I really hope comes to fruition, as there’s a lack of sites to support exchange of strategies and advice for people involved in digital preservation, as well as to field questions from persons familiarizing themselves with the practice.
This latter audience has been on my mind particularly since leaving the DPOE program last year. Although we have fielded questions over an email listserv, this venue has a few significant weaknesses:
It’s difficult to bookmark or reference back to advice or information within a thread.
The email body and thread is not friendly to text formatting, links, and other formatting that would make information more readable, digestible and inclusive.
The information is unstructured — one can not apply tags, select a topic as a favorite, vote up a discussion, or track edits in any systematic way.
By contrast, the StackExchange approach is a mix between a question-and-answer site and Wikipedia, with some reward elements to provide incentive for good contributions. There are a host of topics covered under the network, from gardening to LEGOs to electrical engineering. The network hosts an Area 51 site, which maintains all the topics proposed presently that users are interested in, but which are not yet formal sites. There’s a lot there, and you’d likely be interested in a few.
Why StackExchange? It features all the methods to structure information I described above. I really can’t imagine a better format (at least, not one already set up and sorted out) for building up a knowledge base in digital preservation, and one that can adjust with time. Digital preservation is a practice that will change immensely with time. There will be an assortment of questions and procedures, ranging from the obscure rescue efforts to large scale and contemporary migration processes.
As part of the state archives here in Mississippi, I do a good bit of training to state employees on electronic records management and preservation. Required retention periods for born-digital objects can range from three to fifteen or more years, while many are marked for permanent retention and will be deposited here at the archives. Considered planning for digital content repeatedly comes up. A single good resource to point them to would be very welcomed.
Consider committing if the topic interests you. It’s especially helpful if you’re already engaged in other StackExchange sites, and as noted there are a whole lot of topics to join, so there’s ample opportunity to get involved with StackExchange. Any interest does help!
I’d like to call attention to a big change for the Goodwill Computer Museum, where I volunteered in Austin, Texas, and worked with many incredibly smart, fun people like Russ Corley, Virginia Luehrsen, Stephen Pipkin, Austin Roche, Phil Ryals and lots others.
The big change is that, because of organizational and aspirational differences between Goodwill and the museum, the museum was taken out back and shot.
We are an Austin, Texas nonprofit organization seeking to inspire and educate the public with engaging exhibits on the evolution of computer history and its influence on our common cultural experience, develop and support digital archival studies through services to universities and other institutions, and conserve computer history information through digital preservation.
I wish them, and Austin Goodwill Computerworks from which the team got its start, the best of luck!
Phil has already posted a thorough, technical and first-hand account of his work as a technician for the Autodin, the Department of Defense’s first computerized message switching system.
I haven’t had much luck getting access to a co-authored game preservation paper through ACM, but along comes this ACM Author-izer and all is well again. The service allows you to authorize a free download of your paper from a specified URL, which is pretty nifty.
I’ve set the link for the JCDL 2011 paper on my about page.