I have a new book out with my colleague Heather Ryan, The No-nonsense Guide to Born-Digital Content.
I started drafting chapters for this book in late 2016 when Heather, then the head of the Archives here and now director of the department, approached me about coauthoring the title. I had never written in chapter form before, nor for more a general audience. Approaching my usual stomping ground of born-digital collection material from this vantage was really intriguing, so I jumped at the chance.
To back up a little, our subject here is collecting, receiving, processing, describing and otherwise taking care of born-digital content for cultural heritage institutions. With that scope, we have oriented this book to students and instructors, as well as current practitioners who are aiming to begin or improve their existing born-digital strategy. We’ve included lots of real world examples to demonstrate points, and the whole of the book is designed to cover all aspects of managing born-digital content. We really discuss everything from collecting policy and forensic acquisition to grabbing social media content and designing workflows. In other words, I’m hoping this provides a fantastic overview of the current field of practice.
Our title is part of Facet Publishing’s No-nonsense series, which provides an ongoing run of books on topics in information science. Facet in general is a great publisher in this space (if you haven’t checked out Adrian Brown’s Archiving Websites, I recommend it), and I’m happy to be a part of it. I thank them for their interest in the book and their immense help in getting it published!
For the last year I have served as Co-PI for a fantastic project, supported by CLIR’s Digitizing Hidden Special Collections and Archives grant program, which centers on the metadata gathering and digitization of the National Snow and Ice Data Center’s (NSDIC) expansive collection of glacier and polar exploration prints within the Roger G. Barry Archives here in Boulder. We have a stellar project archivist leading the work, and we expect to begin posting images on our own site over the course of the year. Stay tuned for that.
The linked article here, posted in the last (ever, actually) issue of GeoResJ is a good summary of the project scope and value from everyone on the team, including our initial PI now at the University of Denver. We’re really excited to be contributing along with NSIDC to glaciology and earth history through this collection, and are planning on further promotion as processing continues along.
Revealing our melting past: Rescuing historical snow and ice data
Author links open overlay panel (ScienceDirect)
Last year I attended the Digital Heritage 2015 conference and presented a paper on digital forensics in the archive. The paper centers on collecting file timestamps across floppy disks into a single timeline to increase intellectual control over the material and to explore the utility of such a timeline for a researcher using the collection.
As I state in the paper, temporal forensic data likely constitutes the majority of forensic information acquired in archival settings, and in most cases this information is gathered inherently through the generation of a disk image While we may expect further use of this data as disk images make their way to researchers as archival objects (and the community’s software, institutional policies and user expectations grow to support it), it is not too soon to explore how temporal forensic data can be used to support discovery and description, particularly in the case of collections with a significant number of digital media.
Many thanks to the organizers of Digital Heritage 2015 for the support and feedback; it was a wonderful and very wide-reaching conference.
Aggregating Temporal Forensic Data Across Archival Digital Media (IEEEXplore) (CU Scholar)
In February, I took part in the first Advanced Topics webinar for the BitCurator Consortium, centered on using the KryoFlux in an archival workflow. My co-participants, Farrell at Duke University and Dorothy Waugh at Emory University both contributed wonderful insights into the how and why of using the floppy disk controller for investigation, capture and processing. Many thanks to Cal Lee and Kam Woods for their contributions, and Sam Meister for his help in getting this all together.
If you are interested in using the KryoFlux (or do so already) I recommend checking the webinar out, if only to see how other folks are using the board and the software.
An addendum to the webinar for setting up in Linux
If you are trying to set up KryoFlux in a Linux installation (e.g. BitCurator), take a close look at the instructions found in README.linux text file located in the top directory of the package downloaded from KryoFlux site. It contains instructions on dependencies needed and the process for allowing access to floppy devices through KryoFlux for a non-root user (such as bcadmin). This setup that will avoid many permissions problems down the line as you will not be forced to use the device as root, and I have found it critical to correctly setting up the software in Linux.
Early in January I attended the first-ever BitCurator Users Forum in Chapel Hill. This was a fantastic day with a group of folks interested in the BitCurator project and digital forensics in an archive setting — definitely one of the most information-packed and directly applicable conferences or forums I’ve attended. I’m very much looking forward to next year’s.
I have a post on the BitCurator site on the disk imaging workflow I’m using with students presently, and there’s a great wrap-up of the day as well.
I have a paper out this month in the American Archivist with my friend and former UT Austin colleague Tim Arnold. The paper centers on best practices for collecting and preserving a collection of tweets, and looks specifically at a collection culled during the protests in Tahrir Square in early 2011. We dig into the difficulties of scoping search terms and users (in the context of the Egyptian Revolution of 2011 and more generally), the constraints of the Twitter API, and how to contextualize the harvesting of thousands of tweets through that API.
Many thanks to the original researchers for collecting the data and to the American Archivist for their interest in the paper.
Preserving the Voices of Revolution: Examining the Creation and Preservation of a Subject-Centered Collection of Tweets from the Eighteen Days in Egypt (SAA) (CU Scholar)
Jon Ippolito, from an interview with Trevor Owens at The Signal:
Two files with different passages of 1s and 0s automatically have different checksums but may still offer the same experience; for example, two copies of a digitized film may differ by a few frames but look identical to the human eye. The point of digitizing a Stanley Kubrick film isn’t to create a new mathematical artifact with its own unchanging properties, but to capture for future generations the experience us old timers had of watching his cinematic genius in celluloid. As a custodian of culture, my job isn’t to ensure my DVD of A Clockwork Orange is faithful to some technician’s choices when digitizing the film; it’s to ensure it’s faithful to Kubrick’s choices as a filmmaker.
As in nearly all storage-based solutions, fixity does little to help capture context. We can run checksums on the Riverside “King Lear” till the cows come home, and it still won’t tell us that boys played women’s parts, or that Elizabethan actors spoke with rounded vowels that sound more like a contemporary American accent than the King’s English, or how each generation of performers has drawn on the previous for inspiration. Even on a manuscript level, a checksum will only validate one of many variations of a text that was in reality constantly mutating and evolving.
In my own preoccupation with disk imaging, generating checksums and storing them on servers, I forget that at best this is the very beginning of preservation; not an incontestable “ground truth” of the artifact.