Repercussions of Amassed Data

I had the pleasure of meeting Mél Hogan while she was doing her postdoctoral work at CU Boulder. I think her research area is vital, though it’s difficult to summarize. But that won’t stop me, so here goes: investigating how one can “account for the ways in which the perceived immateriality and weightlessness of our data is in fact with immense humanistic, environmental, political, and ethical repercussions” (The Archive as Dumpster).

Data flows and water woes: The Utah Data Center is a good entry point for this line of inquiry. The article explores the above quoted concerns (humanistic, environmental, political, and ethical) at the NSA’s Utah Data Center, near Bluffdale. It has suffered outages and other operational setbacks since construction. These initial failures are themselves illuminating, but even assuming such disruptions are minimized in the future, the following excerpt clarifies a few of the material constraints of the effort:

Once restored, the expected yearly maintenance bill, including water, is to be $20 million (Berkes, 2013). According to The Salt Lake Tribune, Bluffdale struck a deal with the NSA, which remains in effect until 2021; the city sold water at rates below the state average in exchange for the promise of economic growth that the new waterlines paid for by the NSA would purportedly bring to the area (Carlisle, 2014; McMillan, 2014). The volume of water required to propel the surveillance machine also invariably points to the center’s infrastructural precarity. Not only is this kind of water consumption unsustainable, but the NSA’s dependence on it renders its facilities vulnerable at a juncture at which the digital, ephemeral, and cloud-like qualities are literally brought back down to earth. Because the Utah Data Center plans to draw on water provided by the Jordan Valley River Conservancy District, activists hope that a state law can be passed banning this partnership (Wolverton, 2014), thus disabling the center’s activities.

As hinted at in a previous post on Lanier, I often encounter a sort of breathlessness invoked when descriptions of cloud-based reserves of data and computational prowess are discussed. Reflecting on the material conditions of these operations, as well as their inevitable failures and inefficiencies (e.g. the apparently beleaguered Twitter archive at the Library of Congress, though I would be more interested in learning about the constraints and stratagems of private operations) is a wise counterbalance that can help refocus discussions on the humanistic repercussions of such operations. And to be sure, I would not exclude archives from that scrutiny.

Report on American Psychological Association and CIA

NYT reports today:

The American Psychological Association secretly collaborated with the administration of President George W. Bush to bolster a legal and ethical justification for the torture of prisoners swept up in the post-Sept. 11 war on terror, according to a new report by a group of dissident health professionals and human rights activists.

NYT has helpfully provided the referenced report on their site.

The Archives at CU Boulder has been collecting information on APA Psychological Ethics and National Security (PENS) debate since 2010. See the call for materials, as well as the report NYT has written up today, at the collection site.

Who Owns the Future?

Excerpts from Who Owns the Future?, by Jaron Lanier.

Lanier defines “Siren Servers” as

an elite computer, or coordinated collection of computers, on a network. It is characterized by narcissism, hyperamplified risk aversion, and extreme information asymmetry. It is the winner of an all-or-nothing contest, and it inflicts smaller all-or-nothing contests on those who interact with it.

Hm, I think I can count a few companies running such servers. On the formation of these servers:

Every attempt to create a pure bottom-up, emergent network to coordinate human affairs also facilitates some new hub that inevitably becomes a center of power, even if that was not the intent…. These days, if everything is open, anonymous, and copyable, then a search/analysis company with a bigger computer than normal people have access to will come along to measure and model everything that takes place, and then sell the resulting ability to influence events to third parties. The whole supposedly open system will contort itself to that Siren Server, creating a new form of centralized power. Mere openness doesn’t work.


In what sense is becoming dependent on private spy agencies crossed with ad agencies, which are licensed by us to spy on all of us all the time in order to accumulate billions of dollars by manipulating what’s put in front of us over supposedly open and public networks, a way of defeating elites? And yet that is precisely what the “free” model has meant.

The start of his premise:

To restate the premise of this project, it’s ultimately better to have paid information in order to create a middle class.

I’ve excerpted some of the author’s more forceful passages, but I found Lanier’s take on the future of an information economy — and his alternative model to it — very smart, and very humane.

Disk Imaging Workflow at

Early in January I attended the first-ever BitCurator Users Forum in Chapel Hill. This was a fantastic day with a group of folks interested in the BitCurator project and digital forensics in an archive setting — definitely one of the most information-packed and directly applicable conferences or forums I’ve attended. I’m very much looking forward to next year’s.

I have a post on the BitCurator site on the disk imaging workflow I’m using with students presently, and there’s a great wrap-up of the day as well.

“Preserving the Voices of Revolution”

I have a paper out this month in the American Archivist with my friend and former UT Austin colleague Tim Arnold. The paper centers on best practices for collecting and preserving a collection of tweets, and looks specifically at a collection culled during the protests in Tahrir Square in early 2011. We dig into the difficulties of scoping search terms and users (in the context of the Egyptian Revolution of 2011 and more generally), the constraints of the Twitter API, and how to contextualize the harvesting of thousands of tweets through that API.

Many thanks to the original researchers for collecting the data and to the American Archivist for their interest in the paper.

Preserving the Voices of Revolution: Examining the Creation and Preservation of a Subject-Centered Collection of Tweets from the Eighteen Days in Egypt (SAA) (CU Scholar)

Checksumming till the cows come home

Jon Ippolito, from an interview with Trevor Owens at The Signal:

Two files with different passages of 1s and 0s automatically have different checksums but may still offer the same experience; for example, two copies of a digitized film may differ by a few frames but look identical to the human eye. The point of digitizing a Stanley Kubrick film isn’t to create a new mathematical artifact with its own unchanging properties, but to capture for future generations the experience us old timers had of watching his cinematic genius in celluloid. As a custodian of culture, my job isn’t to ensure my DVD of A Clockwork Orange is faithful to some technician’s choices when digitizing the film; it’s to ensure it’s faithful to Kubrick’s choices as a filmmaker.


As in nearly all storage-based solutions, fixity does little to help capture context.  We can run checksums on the Riverside “King Lear” till the cows come home, and it still won’t tell us that boys played women’s parts, or that Elizabethan actors spoke with rounded vowels that sound more like a contemporary American accent than the King’s English, or how each generation of performers has drawn on the previous for inspiration. Even on a manuscript level, a checksum will only validate one of many variations of a text that was in reality constantly mutating and evolving.

In my own preoccupation with disk imaging, generating checksums and storing them on servers, I forget that at best this is the very beginning of preservation; not an incontestable “ground truth” of the artifact.

ZZT, Anna Anthropy and Preserving Games

I just finished reading Anna Anthropy’s ZZT, from the Boss Fight Books series. While I have a few issues with the book, I was really happy with her work and felt that it struck a great balance between personal narrative and game history.

On the latter, I’m especially happy that the author has taken pains to convey the culture surrounding the ZZT game and its creation tools. There are two reasons for this which tie well into game preservation.

First, it’s a prime example of discussing games and game development outside the context of entertainment. I previously linked to Jaroslav Švelch’s article in Game Studies, “Say it with a Computer Game”. Anna Anthropy’s book demonstrates how a game (in this case ZZT and the games made from its toolkit) facilitated groups, rivalries, skill demonstrations, personal expression, cultural commentary, and so on.

It’s also a great example of looking beyond gameplay as the final result of game preservation. I recently attended the Born Digital and Cultural Heritage conference in Melbourne, put on by the Play It Again group there. In his keynote Henry Lowood emphasized looking to end products of the preservation process beyond playing the game, such as recordings of play, narratives of play, the cultural materials surrounding the game, etc. ZZT preserves some of the experience of play, and of being enmeshed in that culture, through a wonderful preservation technology that goes criminally under-emphasized: writing.

(As a further example, if you download Stanford’s DOOM collection you’ll have the shareware copy of the game, but along with that you’ll find a wealth of artifacts surrounding the game: .WAD collections, web pages and fan sites, articles, reviews, forum user threads, and the like (and I will add, many copies of beta and alpha versions of the seminal shooter, which I have argued before ought to be a key priority for game archives). It’s an excellent resource and any researcher would want to move through this collection as a way to understand the game and some of its critical context.)