Preservation File Format Standards

Purpose

Digital information is fragile. Digital records rely on computer hardware and software to be interpreted and rendered in readable format.  Therefore, digital records that must be retained for long periods (10 years or longer) are vulnerable when the hardware and software technologies upon which they depend become obsolete.  A viable long-term digital preservation program utilizes technology-neutral, open standard file formats to help mitigate this threat. 

Preserving digital information is complicated and the use of preferred and supported file formats for digital records that must be retained permanently or for long periods of time will help to: 

  • Protect College records as evidence of business transactions and ensure their accessibility and usability for as long as they are needed;
  • Support the authenticity, reliability, and integrity of College records over time,
  • Facilitate retrieval and use of College archival records; and
  • Support the interoperability of College records managed in diverse technical and business domains.

Audience

The intended audience for this standard includes:

  • Emerson College staff and student employees who create or receive records in the ordinary course of performing business functions and processes;
  • IT staff who design, implement and/or manage, modify and decommission business applications and records systems;
  • IT staff who manage third party vendor relationships;
  • Department and office staff who administer the Records Retention Policy and administration of the unit’s records management software capabilities.

Overview

This standard describes the file formats that Emerson College will support within the Digital Archives.  The formats are identified as: 

  • Preferred; or
  • Supported.

“Preferred” formats are those that the Archives staff believes will be sustainable over a long period of time.  “Supported” formats are those formats that the Archives staff believes are commonly used formats that will be preserved in the Emerson College Archives.

Preferred and supported formats are listed in order of preference.

Images (Still)

  • Preferred: TIFF, JPEG2000, PNG
  • Supported: JPG, PNG, JPG2000

Video

  • Preferred: Motion JPEG2000, QuickTime Movie (uncompressed), MPEG-4
  • Supported: MPEG-2, MPEG-1, AVI

Audio

  • Preferred: AIFF, WAV
  • Supported: MP3

Text Documents

  • Preferred: PDF/A, Plain Text
  • Supported: Unicode Text, PDF, ACSII

Web Archiving

  • Preferred: WARC, ARC
  • Supported: HTML, CSS, PHP, MYSQL

This list will evolve over time as new formats are introduced and older formats become obsolete.  Any digital content that the departments and offices wish to transfer to the Archives for preservation that does not fall into the category of “preferred” or “supported” will be evaluated on the basis of their value.  Digital content that is deemed of preservation value will be migrated to a “recommended” preservation format, if possible.

Exclusions

There are several exclusions to our digital preservation program:

  • The Emerson College Digital Archives does not intend to provide full preservation for formats listed under the “Unstable or Non-Preservation Formats” column.
  • No files with viruses will be accepted (refers especially to these file formats: DOC, XLS, MDB, PPT, ZIP, EXE). Scan files for viruses with up-to-date virus scanners before transmitting files to the Digital Archives.
  • Full preservation services will not be provided for any files fully or partially encrypted, password-protected, unembedded proprietary fonts, or compressed with a proprietary compression algorithm.
  • Full preservation services will not be provided for any files produced with Digital Right Management controls.

File Format Checklist

This section is intended to help Emerson College Departmental Records Officers (DROs) and administrators evaluate and prepare digital content before transferring it the Emerson College Digital Archives.  

Unsupported file formats will not be accepted without prior appraisal and approval by the Digital Archivist.

Email

  • Preservation Formats:
    • XML
    • PDF/A-1 (ISO 19005-1) (*pdf)
    • EML (*.eml)
    • PST (*.pst)
  • Supported Formats: Excel (*.xls)
  • Unstable or Non-Preservation Formats: Any other formats not listed here

Text

  • Preservation Formats:
    • Plain Text (encoding: US-ASCII, UTF-8, UTF-16 with BOM)
    • Plain Text (ISO 8859-1)
    • XML 
    • PDF/A-1 (ISO 19005-1) (*pdf)
  • Supported Formats:
    • PDF (*.pdf)
    • Rich Text (*.rtf)
    • HTML
    • SGML 
    • Microsoft Word (*.doc)
       
  • Unstable or Non-Preservation Formats:
    • WordPerfect (*.wpd)
    • DVI (*.dvi)
    • Any other formats not listed here

Images (Still)

  • Preservation Formats:
    • TIFF (uncompressed)
    • JPEG2000 (lossless) (*.jp2)
    • PNG (*.png)
  • Supported Formats:
    • BMP (*.bmp)
    • JPEG/JFIF (*.jpg)
    • JPEG2000 (lossy) (*.jp2)
    • TIFF (compressed)
    • GIF (*.gif)
  • Unstable or Non-Preservation Formats:
    • Digital Native DNG (*.dng)
    • MrSID (*.sid)
    • TIFF (in Planar format)
    • FlashPix (*.fpx)
    • Photoshop (*.psd)
    • RAW
    • JPEG2000 Part 2 (*.jpf, *.jpx)
    • All other image formats not listed here

Graphics

  • Preservation Formats: SVG (no Java script binding) (*.svg)
  • Supported Formats: Computer Graphic Metafile (CGM, WebCGM) (*.cgm)
  • Unstable or Non-Preservation Formats:
    • Encapsulated Postscript (EPS)
    • Macromedia Flash (*.swf)
    • All other vector image formats not listed here

Audio

  • Preservation Formats:
    • AIFF (PCM) (*.aif, *.aiff)
    • WAV (PCM) (*.wav) 
  • Supported Formats:
    • MP3 (MPEG-1/2, Layer 3) (*.mp3)
    • Standard MIDI (*.mid, *.midi)
    • Advance Audio Coding (*.mp4, *.m4a, *.aac)
    • Ogg Vorbis (*.ogg)
    • SUN Audio (uncompressed) (*.au)
    • Free Lossless Audio Codec (*.flac)
  • Unstable or Non-Preservation Formats:
    • AIFC (compressed) (*.aifc)
    • NeXT SND (*.snd)
    • RealNetworks ‘Real Audio’ (*.ra, *.rm, *.ram)
    • Windows Media Audio (*.wma)
    • Protected AAC (*.m4p)
    • WAV (compressed) (*.wav)
    • All other audio formats not listed here

Video

  • Preservation Formats:
    • Motion JPEG 2000 (ISO/IEC 15444-4) (*.mj2)
    • AVI (uncompressed, motion JPEG) (*.avi)
    • QuickTime Movie (uncompressed, motion JPEG) (*.mov) 
  • Supported Formats:
    • MPEG-1, MPEG-2 (*.mpg, *.mpeg, wrapped in AVI, MOV)
    • MPEG-4 (H.263, H.264) (*.mp4, wrapped in AVI, MOV) 
    • Ogg Theora (*.ogg)
  • Unstable or Non-Preservation Formats: 
    • AVI (others) (*.avi)
    • QuickTime Movie (others)
    • RealNetworks ‘Real Video’ (*.rv)
    • Windows Media Video (*.wmv)
    • All other video formats not listed here

Spreadsheet Database

  • Preservation Formats:
    • Comma Separated Values (*.csv)
    • Delimited Text (*.txt)
    • SQL DDL
  • Supported Formats:
    • DBF (*.dbf)
    • OOXML (ISO/IEC DIS 29500) (*.xlsx)
    • Microsoft Excel (*.xls)
  • Unstable or Non-Preservation Formats: All other spreadsheet/database formats not listed here

Virtual Reality

  • Preservation Formats: X3D (*.x3d)
  • Supported Formats: 
    • VRML (*wrl, *.vrml)
    • U3D (Universal 3D file format)
  • Unstable or Non-Preservation Formats: All other virtual reality formats not listed here

Computer Programs

  • Preservation Formats: None
  • Supported Formats: Computer program source code (*.c, *.c++, *.java, *.js, *.jsp, *php, *.pl, etc.)
  • Unstable or Non-Preservation Formats: All other virtual reality formats not listed here

Presentation

  • Preservation Formats: None
  • Supported Formats:  
    • OOXML (ISO/IEC DIS 29500) (*.pptx) 
    • PowerPoint (*.ppt)
  • Unstable or Non-Preservation Formats: All other presentation formats not listed here

Web Archiving

  • Preservation Formats: 
    • WARC
    • ARC
  • Supported Formats:
    • HyperText Markup Language (*.html)
    • CSS Cascading Style Sheets (*.css)
    • Hypertext Preprocessor (*.php)
    • My Structured Query Language (*.sql)
  • Unstable or Non-Preservation Formats: All other web formats not listed here

Last revised December 1, 2023