DigiPres Commons Community-owned digital preservation resources

The digital preservation community is small and under resourced. This means we have to work together if we want to make a real impact. This site aims to provide a gateway to all of the wonderful community-owned and community-oriented resources out there that are dedicated to digital preservation.

Digital Preservation Community Resources

Get Started

Save Digital Stuff Right Now

Spotted digital data at risk, but don’t know who can save it?

Preserve Your Own Stuff

Become Part Of The Digital Preservation Community

Advance digital preservation by pooling our experience, sharing our war stories and finding the answers to the big questions.

Real Data and Requirements

Real data, real challenges and real requirements make your and others digital preservation developments far more useful and effective.

Test Corpora

To improve our digital preservation tools, we need to be able to test them and evaluate of their performance. Publicly available sample files make this much easier. Tool developers can use them to test their work, discover bugs, and hone their tools ready for others to use. A test corpus can contain real digital objects from a collection, or be created specifically for exhibiting certain characteristics for testing purposes. Real data, particularly with examples of broken, badly formed or corrupted files can be particularly useful.

Note that OPF also has it’s own corpora page.

Multi-format Corpora

Format-specific Corpora

Building Corpora

If the existing corpora aren’t cutting it, perhaps you can contribute to the OPF Format Corpus (hosted on GitHub). There’s a guide here on how to contribute or you can contact OPF for help on how to get involved.

Sourcing test files from web archives

Web archives can provide a useful source of files of particular formats. For example, search via the UKWA interface.


Software tools give us the means the interrogate, manipulate, understand and ultimately preserve our digital data.

Building Workflows

Resources to help build up preservation workflows, e.g. templates for how to use command-line tools, and how to chain things together.

Understanding Formats

We need to understand the file formats of the resources we care for, and the software they depend on.

Improving Identification

Identifying file formats is the bread and butter of digital preservation characterisation and assessment. Identification tool coverage and accuracy could be much better, and this primarily comes down to the signatures, or file format “magic”, used to identify each format. You can help contribute and make our identification tools more effective here:

If you want to start to put this into practice you can identify file formats right now (with no installation or setup) using FIDOO or alternatively check out stand alone file format identification tools.

Improving Characterisation/Metadata Extraction

Deep file characterisation enables validation, identification of preservation risks and extraction of metadata. In developing a new characterisation capability, begin with thorough research to identify existing code to re-use or build on, develop a focused command line tool, then consider turning it into a JHOVE module.

The goal is to help the members of the international digital preservation community to find each other, to grow, and to find ways to support each other. Crucially, we want to help pool our knowledge and resources so we can do more and better preservation, and try to avoid anyone re-inventing the wheel. Of course, this ethos also extends to this gateway site, so please raise any issues (e.g. what have we missed?), contribute to this web site, or discuss your ideas with us.

All images sourced from the Noun Project, including: Question image by Henry Ryder, Swiss Army Knife image by Olivier Guin, Add folder image by Sergio Calcara, People image by T. Weber, Cross hairs image by __Lo._ and chain image by Adam Whitcroft.

Digipres Commons is kindly hosted by the Open Preservation Foundation.