Back in 2002 Consultative Committee for Space Data Systems made a recommendation to the ISO for an Open Archival Information System. The recommendation has found broad acceptance and varying levels of compliance are usually elaborated upon in the digital repository software packages like DSpace. Since we want our archive to have a future as a federated or cooperating (OAIS terms) archive, and since the terminologies and concepts created in this document are widespread, I decided to take some notes on the recommendation as they relate to potential metadata elements we’ll employ.
The recommendation mostly concerns itself with the long term preservation of digital objects, although the framework incorporates metadata for physical objects as well. Broadly, OAIS defines an Information Object as a Data Object coupled with its Representation Information. The Representation Information allows a person to understand how the bits in the Data Object are to be interpreted. An example would be a TIFF file (Data Object) coupled with an ASCII document (Representation Information) detailing the headers, its compression method, etc., like here: TIFF description at Digital Preservation (The Library of Congress). Of course, one might also want Representation Information for the ASCII file, to explain how characters are interpreted in that format. OAIS terms this phenomenon recursive Representation Information and one might eventually accrue a Representation Network of such digital objects. One stops when the Knowledge Base of your Designated Community has the requisite knowledge to understand your top-most piece of Representation Information.
OAIS defines two types of Representation Information: Structure and Semantic. Structure Information describes the data format applied to the bit sequence to derive more meaningful values like characters, pixels, numbers, etc. Semantic Information describes the social meaning behind these higher values (for example that the text characters are English).
OAIS discourages using software that can access and use Data Objects as a replacement for comprehensive Representation Information. Although that would serve the end user well enough for a time, the software itself naturally poses its own obsolescence problem. Of course, the digital media we would like to preserve is mostly software itself. We may have datasets, images, scans, etc., but the majority of digital assets we hold are complete software packages. This includes operating systems, office suites, computer games, console games (on cartridges) and so on. Retrieving Representation Information for all these types of software will be a considerable and ongoing task, as most software will consist multiple file types.