Access, Use and Reuse

Functions that support the DCC Lifecycle Stage defined as "Ensure that data is accessible to both designated users and reusers, on a day-to-day basis. This may be in the form of publicly available published information. Robust access controls and authentication procedures may be applicable."

Access

Tools that facilitate access to digital data by users.

  • ArchivesSpace - ArchivesSpace is the next-generation web-based archives information management system, designed by archivists and supported by diverse archival repositories.
  • Archon - Archon automatically publishes archival descriptive information and digital archival objects in a user-friendly website.
  • CollectiveAccess - CollectiveAccess is web-based software to catalogue, manage, and publish museum and archival collections.
  • DSpace - DSpace is an institutional repository system which enables easy deposit, preservation, and access for all types of digital content.
  • Djatoka - djatoka is open source Java software that builds upon a rich set of APIs and libraries to provide a service framework for the dynamic dissemination of JPEG 2000 image files.
  • EPrints - EPrints is an open access digital repository software, which is intended to create a highly configurable web-based repository.
  • IIPImage - IIPImage is an advanced high-performance imaging server and client for web-based streamed remote visualization of ultra resolution scientific imagery.
  • LOCKSS (Lots of Copies Keep Stuff Safe) - LOCKSS software allows libraries to create preserved digital collections out of materials that would otherwise be accessible only through a licensed academic subscription.
  • Library of Congress Newspaper Viewer - The Library of Congress Newspaper Viewer is a web application used to ingest and view digitized newspaper pages meeting the National Digital Newspaper Program specification.
  • MPP Viewer - MPP Viewer is a viewer for Microsoft Project files
  • Omeka - Omeka is a free open source web-publishing platform for the display of library, museum, archives, and scholarly collections and exhibitions.
  • Recollection - Recollections is a free open source platform for generating and customizing views (interactive maps, timelines, facets, tag clouds) that allow scholars, librarians, and curators to explore digital collection.
  • Rescarta - The ResCarta Tools software empowers users to create non-proprietary digital objects with LOC standard METS, MODS, MIX and AudioMD metadata from existing TIFF, JPEG, PDF and WAV data through user-friendly interfaces. Digital collections can be created, indexed, displayed and validated using the software. Exports DC, OAI_DC formats for use in OAI/PMH servers.
  • Rosetta - Ex Libris Rosetta enables institutions to preserve and provide access to the collections in their care.
  • SIARDexcerpt - SIARDexcerpt is a Java-based application that searches and extracts individual records of SIARD files.
  • Simile Exhibit - Exhibit lets you easily create web pages with advanced text search and filtering functionalities, with interactive maps, timelines, and other visualizations.
  • SobekCM - SobekCM is a digital repository and digital scholarship/publishing system which enables easy deposit, preservation, and access for all types of digital content, tailored to the needs of galleries, libraries, archives, museums, scholars, and researchers.
  • The Open Video Digital Library Toolkit - The Open Video Digital Library Toolkit project is intended to provide museums, libraries and other institutions holding moving image collections tools to more easily create Web-based digital video libraries.
  • Voyeur - Voyeur is a web-based text analysis environment that can use texts in a variety of formats, from different locations to perform lexical analysis, export data to other tools, and embed live tools into remote websites.
  • Wayback Machine - The Wayback Machine is a powerful search and discovery tool for use with collections of Web site "snapshots" collected through Web harvesting, usually with Heritrix (ARC or WARC files).
  • Wayfinder - Wayfinder is a developing resource for students and researchers to use in browsing digital archives.

Annotation

Tools that facilitate annotation of digital data by users.

  • Clipper - Clipper is a free open-source web application enabling researchers to create and share virtual-clips without altering the original media files

Discovery

Tools that facilitate the discovery of digital data by users.

  • EnCase eDiscovery - EnCase eDiscovery is the market leading e-discovery software that enables more efficient business process and significantly reduces legal risk and cost with a judicially accepted solution that provides everything from legal hold to first pass review and is scalable, defensible, and repeatable.
  • Project Blacklight - Blacklight is a free and open source ruby-on-rails based discovery interface (a.
  • SobekCM - SobekCM is a digital repository and digital scholarship/publishing system which enables easy deposit, preservation, and access for all types of digital content, tailored to the needs of galleries, libraries, archives, museums, scholars, and researchers.
  • TReSy - TReSy is an XML search engine oriented to text retrieval.
  • UpLib - UpLib is a personal digital library system that provides a long-term archival system with powerful search and a visually oriented retrieval mechanism, suitable for a wide variety of personal documents such as papers, photos, receipts, music, Web pages, books, clippings, and email.
  • Wayback Machine - The Wayback Machine is a powerful search and discovery tool for use with collections of Web site "snapshots" collected through Web harvesting, usually with Heritrix (ARC or WARC files).
  • Wayfinder - Wayfinder is a developing resource for students and researchers to use in browsing digital archives.
  • Web Archive Discovery - Indexing and discovery tools for web archives.
  • XPAT - The XPAT engine is an SGML/XML-aware search engine that the University of Michigan has deployed with an extremely diverse set of digital library resources.

Redaction

Tools that support the removal of selected information from digital files. Typically used for removal of sensitive information like telephone or credit card numbers from personal archives before providing access to users.

  • MRU-Blaster - MRU-Blaster is a program made to do one large task - detect and clean MRU (most recently used) lists on your computer.
  • Microsoft Office 2003 Add-in: Word Redaction v1.2 - Use the Word 2003 Redaction Add-in to hide text within Microsoft Office Word 2003 documents.
  • Microsoft Office 2003/XP Add-in: Remove Hidden Data - With this add-in you can permanently remove hidden data and collaboration data, such as change tracking and comments, from Microsoft Word, Microsoft Excel, and Microsoft PowerPoint files.
  • RapidRedact - The RapidRedact product range provides fast, easy to use redaction tools for irreversibly blanking out (redacting) selected information, author's changes and hidden data from all electronic document types.
  • Redact-It - Provides Windows desktop and server redaction of PDF, Word, scanned TIFF images. Find, black out and remove content within documents, images or drawings.
  • Redax - Redax completely redacts (removes) text and graphics from the PDF page.

Create or Receive (Acquire)

Functions that support the DCC Lifecycle Stage defined as "Create data including administrative, descriptive, structural and technical metadata. Preservation metadata may also be added at the time of creation. Receive data, in accordance with documented collecting policies, from data creators, other archives, repositories or data centres, and if required assign appropriate metadata."

Data capture and Deposit

Tools that enable the capture and deposit of data.

  • Artivity - A tool for capturing contextual data produced during the creative process of artists and designers while working on a computer.
  • Screen-scraper - screen-scraper is a tool for extracting data from websites.
  • Tabula - Extract tabular data from PDF files
  • WARCreate - Google Chrome browser extension for creating WARC files from web pages
  • Web Scraper Plus+ - Web Scraper Plus+ takes data from the web and puts it into a spreadsheet or database.

Disk Imaging

Tools that enable the capture, viewing or extraction of contents of a disk image (which is a computer file containing the contents and structure of a disk volume or an entire data storage device, such as a hard drive or floppy disk).

  • AFF Open Source Computer Forensics Software - Tools for the creation of disk images, used in conjunction with the AFF open and extensible file format to store disk images and associated metadata.
  • CDRDAO (CDR Disk At Once) - Cdrdao records audio or data CD-Rs in disk-at-once (DAO) mode based on a textual description of the CD contents.
  • CloneCD - CloneCD is the perfect tool to make backup copies of your music and data CDs, regardless of copy protection.
  • Dc3dd for computer forensics - dc3dd is a patched version of GNU dd with a number of features useful for computer forensics.
  • Disktype - Tool for detecting the content format of a disk or disk image. It knows about common file systems, partition tables, and boot codes.
  • DriveImage XML - DriveImage XML is an easy to use and reliable program for imaging and backing up partitions and logical drives.
  • Easy CD-DA Extractor - Easy CD-DA Extractor is CD Ripper, Music Converter, Audio Converter, Metadata Editor, and CD/DVD burning software.
  • GetDriveInfo2 - GetDriveInfo2 is a Win32 program that examines the optical and removable media drives currently mounted on a computer, and returns information about those devices (in the case of optical devices it also returns information about the any media currently mounted in the device).
  • IMAGE - IMAGE is a DOS application capable of generating either highly compressed or "flat" images for forensic analysis.
  • IsoBuster - Recover data from CD, DVD, BD, HDD, Flash drive, USB stick, media card, SD and SSD.
  • KryoFlux - Floppy disk controller software that accompanies a KryoFlux drive
  • Paranoia - "Use your CDROM drive to read audio tracks.... and have it actually work right!"
  • PhotoRescue - PhotoRescue is the best and fairest picture and data recovery solution for digital film - sd cards, compact flash, memory sticks, microdrive, etc.
  • Power ISO - PowerISO is a powerful CD/DVD image file processing tool, which allows you to open, extract, create, edit, compress, encrypt, split and convert ISO files, and mount these files with internal virtual drive.
  • QPxTool - With QPxTool you can measure the quality of CDs and DVDs.
  • Virtual CloneDrive - Virtual CloneDrive works and behaves just like a physical CD/DVD drive, but it exists only virtually.
  • Zlon HDD cloning and imaging - Zlon is a disk imaging tool.

File Copy

Tools that support the copying of files from one storage location to another, typically with facilities to verify the completeness of the copy and enable resumption of copying after an interruption.

  • BIL (BagIt Library) - BagIt Library is a Java software library that supports the creation, manipulation and validation of bags.
  • BagIt Transfer Utilities - BagIt transfer Utilities are a collection of tools developed for the purpose of validation and transfer of bags.
  • Cp Unix command - cp copies files (or, optionally, directories). Part of GNU coreutils.
  • Cryptcat - Cryptcat is a lightweight version of netcat with integrated transport encryption capabilities.
  • Dcfldd - dcfldd is an enhanced version of GNU dd with features useful for forensics and security.
  • Dd Unix command - This page gives information on using the dd Unix command.
  • XXCopy - XXCopy is an expanded version of Xcopy
  • Xcopy - Xcopy copies files and directories, including subdirectories.

OCR

Tools that support the generation of text from bitmap images, otherwise known as Optical Character Recognition

  • Tesseract-ocr - Open source OCR engine, accepting uncompressed TIFF files as input

Web Crawl

Tools that support the capture of data from the world wide web, typically by "crawling" links between resources.

  • Archive-It - Archive-It is the leading web archiving service for collecting and accessing cultural heritage on the web. It is a service provided by the Internet Archive.
  • ArchiveFacebook - ArchiveFacebook is a Firefox extension which allows individuals to save and manage Facebook web content.
  • ContextMiner - ContextMiner is a framework to collect, analyze, and present the contextual information along with the data.
  • Curate.Us - With a simple click of the mouse, you can create visually compelling clips and quotes of web content that are easily embedded in blog posts, email, forums, and websites.
  • DeepArc - Intended for preserving web sites from the back-end, this is a database-to-XML curation tool.
  • Find It! Keep It! - Find It! Keep It! is a tool to save and organise web content.
  • GNU Wget - Non-interactive network downloader
  • HTTrack - HTTrack is a website copying utility.
  • Heritrix - Heritrix is an open-source web crawler, allowing users to target websites they wish to include in a collection and to harvest an instance of each site.
  • Heritrix plug-in for rich media capture - The Rich Media Capture module (RMC), developed in the LiWA (Living Web Archives) project, is designed to enhance the capturing capabilities of the crawler, with regards to different multimedia content types.
  • Metaproducts - Metaproducts offers several commercial capture and off-line browsing tools.
  • NetarchiveSuite - NetarchiveSuite is a web archiving software package designed to plan, schedule and run web harvests of parts of the Internet.
  • NutchWAX - NutchWAX is software for indexing ARC files (archived Web sites gathered using Heritrix) for full text search.
  • PageVault - pageVault supports the archiving of all unique responses generated by a web server.
  • Pagelyzer - Suite of tools for detecting changes in web pages and their rendering
  • RARC (ARC replicator) - rARC is a distributed system that enables Internet users to provide storage space from their computers to replicate small parts of the archived data stored in the central repository of the Web archive.
  • SiteStory - SiteStory is a transactional web archive. It archives resources of a web server it is associated with.
  • Spadix software - Spadix Software can download websites from a starting URL, search engine results or web dirs, and is able to follow external links.
  • Storytracker - Tools for tracking stories on news homepages
  • Tennyson Maxwell Information Systems - Tennyson Maxwell Information Systems offers a variety of features to support multithreaded retrieval, password-protected access, filtering, batch capture, and management of derived databases.
  • The DeDuplicator (Heritrix add-on module) - The DeDuplicator is an add-on module for Heritrix to reduce the amount of duplicate data collected in a series of snapshot crawls.
  • The Nalanda iVia Focused Crawler - The Nalanda iVia Focused Crawler (NIFC) is a focused Web crawler.
  • TubeKit - TubeKit is a toolkit for creating YouTube crawlers.
  • WARCreate - Google Chrome browser extension for creating WARC files from web pages
  • WAS (Web Archiving Service) - The Web Archiving Service (WAS) is a Web-based curatorial tool that enables libraries and archivists to capture, curate, analyze, and preserve Web-based government and political information.
  • WAXToolbar - WAXToolbar is a firefox extension to help users with common tasks encountered surfing a web archive.
  • WCT (Web Curator Tool) - Web Curator Tool (WCT) is a workflow management application for selective web archiving.
  • WERA (Web ARchive Access) - WERA (Web ARchive Access) is a freely available solution for searching and navigating archived web document collections.
  • WarcManager - The WARC Manager is a web-based UI for managing and querying collections of web crawl data.
  • Warrick - Warrick is a free utility for reconstructing (or recovering) a website from web archives.
  • Wayback Machine - The Wayback Machine is a powerful search and discovery tool for use with collections of Web site "snapshots" collected through Web harvesting, usually with Heritrix (ARC or WARC files).

Web Snapshot

Tools that support the capture of a static snapshot of a web page.

  • Khtml2png - khtml2png is a command line program to create screenshots of webpages.
  • Pearl Crescent Page Saver - Pearl Crescent Page Saver is an extension for Mozilla Firefox that lets you capture images of web pages, including Flash content.
  • Snagit - Snagit is screen capture software to create interesting training documents, collaborative design work, IT bug reports, and more.
  • WARCreate - Google Chrome browser extension for creating WARC files from web pages
  • WebShot - WebShot allows you to take screenshots of web pages and save them as full sized images or thumbnails.
  • Webkit2png - webkit2png is a command line tool that creates png screenshots of webpages.

Workflow and Lab Notebook Management

Tools that support the capture and management of research data as well as the details of the research activities which generated them.

  • Artivity - A tool for capturing contextual data produced during the creative process of artists and designers while working on a computer.
  • CRunch - cRunch provides an infrastructure for exploratory data analysis with the statistical programming language and environment R
  • Kepler - Kepler is a scientific workflow modelling and management system that enables users, regardless of programming experience, to set up data analysis pipelines.
  • LabTrove - LabTrove is a blogging platform specifically designed for use in a research environment.
  • MyExperiment - myExperiment is an online social networking service aimed at scientific researchers; the site fosters collaboration by allowing members to share scientific workflows, experiment plans, and other digital objects.
  • Taverna - Taverna is a scientific workflow management system designed to assemble, run, document and share sequences sequences of web services and scripts.

Cross-Lifecycle Functions

Functions that operate across the digital lifecycle and therefore cannot be easily categorised by DCC Lifecycle Stage.

Academic Social Networking

Tools that support making connections, sharing research and maximising the impact of digital data.

  • Mendeley - Mendeley is a combination web service and desktop application that allows users to create, manage, and share collections of references.
  • MyExperiment - myExperiment is an online social networking service aimed at scientific researchers; the site fosters collaboration by allowing members to share scientific workflows, experiment plans, and other digital objects.
  • ResearchGate - ResearchGate is an online professional network for scientists and researchers, particularly employed by those wishing to follow and track the publication outputs of others in their field.

Binary & Hexidecimal Editing

Tools for viewing and editing of files displayed in different views such as binary, hexadecimal. These are typically known as hex editors.

  • Bless - Bless is a high quality, full featured hex editor.
  • Hex Workshop - The Hex Workshop Hex Editor by BreakPoint Software is a complete set of hexadecimal development tools for Microsoft Windows 2000 and later.
  • HxD - Free Hex- and Ram-Editor
  • WxHexEditor - A free hex editor / disk editor

Forensic

Tools that support forensics related functions.

  • AFFLIB - The Advanced Forensics Format (AFF) and AFF Library (AFFLIB) are a joint development project of Simson L.
  • Autopsy Forensic Browser - graphical interface to the command line digital investigation tools in The Sleuth Kit
  • DataLifter - suite of tools "designed to assist with Computer Forensics, Information Auditing, Information Security and Data Recovery"
  • Dc3dd for computer forensics - dc3dd is a patched version of GNU dd with a number of features useful for computer forensics.
  • Dcfldd - dcfldd is an enhanced version of GNU dd with features useful for forensics and security.
  • Digital Intelligence Forensic Software - Digital Intelligence Forensic Software
  • EnCase Forensic (Guidance Software) - EnCase Forensic (Guidance Software)
  • FCCU GNU/Linux Forensic Boot CD - bootable CD with Linux and forensic tools
  • FTK (Forensic Toolkit) - Forensic Toolkit (AccessData)
  • Farmer's Boot CD (FBCD) - bootable CD with Linux and forensic tools
  • Foremost - Foremost is a console program to recover files based on their headers, footers, and internal data structures.
  • Forensic Acquistion Utilities - A collection of utilities and libraries intended for forensic or forensic-related investigative use in a modern Microsoft Windows environment.
  • Freeware Hex Editor XVI32 - XVI32 is a freeware hex editor running under Windows 95, Windows 98, Windows NT, Windows 2000, and Windows XP.
  • Gumshoe - Search interface for metadata extracted from forensic disk images.
  • HashKeeper - Digital Evidence Laboratory specialists created the HashKeeper software in 1998 to expedite the analysis of electronic media by reducing the number of files to be analyzed during the course of an investigation.
  • Helix (e-fense) - bootable CD with Linux and forensic tools
  • Hex Workshop - The Hex Workshop Hex Editor by BreakPoint Software is a complete set of hexadecimal development tools for Microsoft Windows 2000 and later.
  • I2 - i2 is a provider of intelligence and investigation management software for law enforcement, defense, national security and private sector organizations.
  • ILookPI - ILookPI provides a fully programmable IDE environment with customizable tool capabilities.
  • Index.dat Analyzer v2.5 - Index.dat Analyzer is a tool to view, examine and delete contents of index.dat files.
  • InfinaDyne - InfinaDyne's forensic products are focused on government and law enforcement examining various types of media and intent on collecting evidence in a thorough, secure and trustworthy manner.
  • KEA (Keyphrase Extraction Algorithm) - KEA is an algorithm for extracting keyphrases from text documents.
  • Libewf - Libewf is a library for support of the Expert Witness Compression Format (EWF), it support both the SMART (EWF-S01) and EnCase (EWF-E01) format.
  • MRU-Blaster - MRU-Blaster is a program made to do one large task - detect and clean MRU (most recently used) lists on your computer.
  • McAfee Free Tools - Free Tools [See specifically Foresnic Tools]
  • Microsoft Office 2003 Add-in: Word Redaction v1.2 - Use the Word 2003 Redaction Add-in to hide text within Microsoft Office Word 2003 documents.
  • Microsoft Office 2003/XP Add-in: Remove Hidden Data - With this add-in you can permanently remove hidden data and collaboration data, such as change tracking and comments, from Microsoft Word, Microsoft Excel, and Microsoft PowerPoint files.
  • NSRL (National Software Reference Library) - The NSRL provides a large data set of metadata on computer files which can be used to identify the files and their provenance
  • OCFA (Open Computer Forensics Architecture) - Open Computer Forensics Architecture is a modular computer forensics framework.
  • Paraben - Paraben provides forensics tools.
  • PyFlag - FLAG (Forensic and Log Analysis GUI) is an advanced forensic tool for the analysis of large volumes of log files and forensic investigations.
  • RAID (Real-time Analytical Intelligence Database) - RAID is a relational database used to record key pieces of information and to quickly identify links among people, places, businesses, financial accounts, telephone numbers, and other investigative information.
  • RapidRedact - The RapidRedact product range provides fast, easy to use redaction tools for irreversibly blanking out (redacting) selected information, author's changes and hidden data from all electronic document types.
  • Redact-It - Provides Windows desktop and server redaction of PDF, Word, scanned TIFF images. Find, black out and remove content within documents, images or drawings.
  • Redax - Redax completely redacts (removes) text and graphics from the PDF page.
  • Regshot - Regshot is an open-source (GPL) registry compare utility that allows you to quickly take a snapshot of your registry and then compare it with a second one - done after doing system changes or installing a new software product.
  • Technology Pathways - Technology Pathways, LLC is a leading edge provider of computer security tools and services for the Corporate IT, government and legal communities.
  • The Carve Path Zero-storage Library and filesystem - LibCarvPath is a library for computer forensics carving tools.
  • The PERPOS Tools: User's Guide - The Archival Repository Tool (ART) is a prototype software tool designed to support archivists in accessing and describing file systems containing electronic records.
  • The Sleuth Kit - Collection of command line computer forensics digital investigation tools.
  • WinHex - WinHex is in its core a universal hexadecimal editor, particularly helpful in the realm of computer forensics, data recovery, low-level data processing, and IT security.
  • Windows IR/CF Tools - This page links to Windows IR/CF Tools.
  • Yara - Pattern matching tool

Metadata Extraction

Tools that support the extraction of metadata from files.

  • Apache PDFBox - JAVA PDF library for creation, manipulation, validation and content extraction of PDF documents
  • Apache POI - the Java API for Microsoft Documents - The Apache POI Project's mission is to create and maintain Java APIs for manipulating various file formats based upon the Office Open XML standards (OOXML) and Microsoft's OLE 2 Compound Document format (OLE2).
  • Apache Tika - Java based tool for identifying file formats using signatures and extracting metadata and text content from documents.
  • BWF MetaEdit - BWF MetaEdit permits embedding, validating, and exporting of metadata in Broadcast WAVE Format (BWF) files.
  • Brunnhilde - Siegfried-based characterization of directories and disk images
  • C3PO - C3PO is a content profiling tool for visualization and preservation analysis
  • DROID (Digital Record Object Identification) - DROID (Digital Record Object Identification) is a software tool developed to perform automated batch identification of file formats.
  • DROID sqlite analysis - Analysis and automatic generation of summary information from DROID output
  • DUMPBIN Utility - The DUMPBIN utility, which is provided with the 32-bit version of Microsoft Visual C++, combines the abilities of the LINK, LIB, and EXEHDR utilities.
  • Disktype - Tool for detecting the content format of a disk or disk image. It knows about common file systems, partition tables, and boot codes.
  • EMET (Embedded Metadata Extraction Tool) - EMET is a stand-alone tool designed to extract metadata embedded in JPEG and TIFF files.
  • EPADD - ePADD is a software package developed by Stanford University's Special Collections & University Archives that supports archival processes around the appraisal, ingest, processing, discovery, and delivery of email archives.
  • EXE Explorer - EXE Explorer reads and displays executable file properties and structure.
  • EXIF to DC XML normaliser - Extract EXIF data and normalise it to DC XML.
  • Easy CD-DA Extractor - Easy CD-DA Extractor is CD Ripper, Music Converter, Audio Converter, Metadata Editor, and CD/DVD burning software.
  • EpubCheck - Validator for EPUB files
  • Exempi - Exempi is a library for handling XMP metadata, based on the Adobe XMP SDK
  • ExifTool - Properties extraction, identification, metadata editing
  • FIDO (Format Identification for Digital Objects) - A PRONOM based, command line, file format identification tool written in Python
  • FIDOO - A PRONOM based, online file format identification tool written in Javascript and HTML5
  • FITS (File Information Tool Set) - FITS allows data curators to identify, validate, and extract technical metadata for the objects in their digital repository.
  • File Analyzer and Metadata Harvester V2 - The File Analyzer is a general purpose desktop (and command line) tool designed to automate simple, file-based operations. The File Analyzer assembles a toolkit of tasks a user can perform. The tasks that have been written into the File Analyzer code base have been optimized for use by libraries, archives, and other cultural heritage institutions.
  • FileAlyzer - FileAlyzer allows a basic analysis of files (showing file properties and file contents in hex dump form) and is able to interpret common file contents like resources structures (like text, graphics, HTML, media and PE).
  • GNU libextractor - GNU libextractor is a library used to extract meta data from files of arbitrary type.
  • GetID3() - Extracts technical and embedded descriptive metadata from common multimedia file formats.
  • IText - PDF library for manipulation, content extraction and creation
  • Index.dat Analyzer v2.5 - Index.dat Analyzer is a tool to view, examine and delete contents of index.dat files.
  • JHOVE (Harvard Object Validation Environment) - JHOVE provides functions to perform format-specific identification, validation, and characterization of digital objects.
  • JHOVE2 - JHOVE2 allows data curators to characterise the digital objects in their repositories.
  • JWAT - Java Web Archive Toolkit
  • Jp2StructCheck - Simple JP2 file structure checker
  • Jpylyzer - JP2 validation + properties extraction
  • Keith Humphreys' PhraseRate - PhraseRate is a program, developed by Keith Humphreys, for extracting a set of meaningful, attractive keywords and key phrases from a web page describing the content of that page.
  • Lingfo - Lingfo provides a library for developers to use to extract information from Microsoft Excel spreadsheet files.
  • MP3::Tag - MP3::Tag is a module for reading tags of MP3 audio files.
  • Mdqc - Tool for managing and comparing digital asset metadata
  • MediaInfo - Supplies technical and tag information about a video or audio file.
  • Metadata Extraction Tool - Metadata Extraction Tool automatically extracts a limited set of metadata from the headers of digital files.
  • NARA File Analyzer and Metadata Harvester - NARA File Analyzer and Metadata Harvester allows a user to analyze the contents of a file system or external drive and generates statistics about the contents of the contained directories.
  • NARA Video Frame Analyzer - NARA Video Frame Analyzer analyzes technical properties of individual frames of a video file in order to detect quality issues within digitized video files.
  • Nanite - A friendly swarm of format-identifying robots
  • ODF Validator - ODF Validator is a tool that validates OpenDocument files and checks them for certain conformance criteria.
  • Officeparser.py - officerparser.py is a python script that parses the format of OLE compound documents used by Microsoft Office applications.
  • OpenJPEG - The OpenJPEG library is an open-source JPEG 2000 codec written in C language.
  • PDF Tools (by Didier Stevens) - Tools for parsing and analysing PDF documents
  • PERICLES Extraction Tool (PET) - A tool to capture contextual information in a sheer curation scenario
  • Pagelyzer - Suite of tools for detecting changes in web pages and their rendering
  • Pdftk - PDF manipulation tool
  • Peepdf - peepdf is a Python tool to explore PDF files in order to find out if the file can be harmful or not.
  • Python XMP Toolkit - Library for working with XMP metadata, as well as reading/writing XMP metadata stored in many different file formats
  • Qpdf - QPDF is a command-line program that does structural, content-preserving transformations on PDF files
  • Warctools - Command line tools and libraries for handling and manipulating WARC files (and HTTP contents)
  • Web Archive Discovery - Indexing and discovery tools for web archives.
  • WordHoard - WordHoard is an application for the close reading and scholarly analysis of deeply tagged texts.
  • Xpdf - Open source PDF viewer that includes PDF information extractor and font analyzer

Personal Archiving

Tools that support the preservation and archiving of data relating to individuals.

  • Muse - A tool used for personal archiving of email.
  • Rescarta - The ResCarta Tools software empowers users to create non-proprietary digital objects with LOC standard METS, MODS, MIX and AudioMD metadata from existing TIFF, JPEG, PDF and WAV data through user-friendly interfaces. Digital collections can be created, indexed, displayed and validated using the software. Exports DC, OAI_DC formats for use in OAI/PMH servers.
  • WARCreate - Google Chrome browser extension for creating WARC files from web pages

Preservation System

Tools that support the management and preservation of digital resources, typically performing a number of functions across the digital lifecycle such as ingest, storage, preservation action and access.

  • ADIGRES - ADIGRES is a powerful cross-platform Document Management System written in Java.
  • Archivematica - Archivematica is a digital preservation system that automates the process of preparing digital objects for ingest into a repository and an access system
  • CONTENTdm - CONTENTdm is a digital collection management system
  • CollectiveAccess - CollectiveAccess is web-based software to catalogue, manage, and publish museum and archival collections.
  • Curator's Workbench - Curator's Workbench is a tool that automates and streamlines the process of preparing collections of digital materials for submission to a repository
  • DAITSS - A digital preservation software application designed as a dark archive to service consortial and institutional preservation repositories a multi-user environment type. DAITSS is considered to be a first-party system.
  • DCape (ingest only) - "The goal of the DCAPE project is to build a distributed production preservation environment that meets the needs of archival repositories for trusted archival preservation services." (Note: This is a work in progress, see notes for more information)
  • DSPS (Digital Preservation Software Platform) - The DPSP is a collection of four software applications which support the goal of digital preservation.
  • DSpace - DSpace is an institutional repository system which enables easy deposit, preservation, and access for all types of digital content.
  • Data Vault - A storage broker and front end for archiving research data that is no longer active but that does not have a need for open publication
  • DataFlow - DataFlow is a two-stage data management infrastructure that is designed to allow researchers to work with, annotate, publish, and permanently store research data.
  • DataStage - DataStage is a flexible data storage system that provides controlled access, secure backup, and the ability to transfer selected files to a more permanent archiving facility.
  • Dataverse - The Dataverse is an open source web application to share, preserve, cite, explore and analyze research data.
  • Digital Preservation Recorder - Digital Preservation Recorder (DPR) is free and open source software developed by the National Archives of Australia to aid in the long term preservation of digital records.
  • Duke Data Accessioner - Data Accessioner provides a graphical user interface to aid in migrating data from physical media to a dedicated file server, documenting the process and using MD5 checksums to identify any errors introduced in transfer.
  • EPADD - ePADD is a software package developed by Stanford University's Special Collections & University Archives that supports archival processes around the appraisal, ingest, processing, discovery, and delivery of email archives.
  • EPrints - EPrints is an open access digital repository software, which is intended to create a highly configurable web-based repository.
  • Fedora Commons - Fedora provides the back-end foundation for digital repository systems responsible for managing and preserving all types of digital content.
  • GFI MailArchiver - GFI MailArchiver is an email archiving software that is the single solution source for your email management problems on Exchange Server.
  • HP Integrated Archive Platform - The HP Integrated Archive Platform (HP IAP) provides a solution for the long-term archival and disposition of information.
  • Hoppla - Hoppla is an archiving solution that combines back-up and fully automated migration services for data collections in small office environments.
  • IRODS (integrated Rule Oriented Data Systems) - iRODS software was designed to allow curators utilising heterogeneous storage and computing facilities to define policies without being concerned with the technical detail of how the system implements those policies and without having to respond to changes in technical infrastructure.
  • InBoxer - InBoxer is a next generation email archiving, IM archiving, e-discovery, and policy management system.
  • Invenio - Invenio is a free software suite enabling you to run your own digital library or document repository on the web.
  • KOST-Simy - The KOST-Simy application is used for Compare Images.
  • KOST-Val - KOST-Val is an open source validator for different file formats (TIFF, SIARD, PDF/A, JP2, JPEG) and Submission Information Package (SIP).
  • KoLibRI (Kopal Library for Retrieval and Ingest) - The kopal Library for Retrieval and Ingest (koLibRI) represents a library of Java tools that have been developed for the interaction with the DIAS system of IBM within the kopal project.
  • LOCKSS (Lots of Copies Keep Stuff Safe) - LOCKSS software allows libraries to create preserved digital collections out of materials that would otherwise be accessible only through a licensed academic subscription.
  • Libsafe - libsafe allows the organizations to create a full OAIS compliant Archive, including active and passive digital preservation workflows and is particularly suited for master image files of digitizing processes.
  • Merritt Repository Service - Merritt is a new cost-effective repository service from the University of California Curation Center (UC3) that lets the UC community manage, archive, and share its valuable digital content.
  • OpenWMS: Workflow Management System for Digital Objects - The OpenWMS is a platform-independent, open source, web-accessible system that can be used as a standalone application or integrated with other repository architectures by a wide range of organizations.
  • Preservica - Preservica is a complete OAIS Digital Preservation system available on the cloud (hosted in US, EU or AUS) and on premise (Standard and Enterprise versions). It is trusted by over 50 organisations across 4 continents to preserve collections both large (>6Pb) and small (few 100kb)
  • Proofpoint Enterprise Archive: SaaS Email Archiving - Proofpoint Enterprise Archive is a SaaS email archiving solution that addresses three key challenges—eDiscovery, regulatory compliance and email storage management—without the headaches of managing archiving in-house.
  • ReDBox - ReDBox and Mint are two complimentary applications designed to create, store, and provide access to research metadata.
  • Rescarta - The ResCarta Tools software empowers users to create non-proprietary digital objects with LOC standard METS, MODS, MIX and AudioMD metadata from existing TIFF, JPEG, PDF and WAV data through user-friendly interfaces. Digital collections can be created, indexed, displayed and validated using the software. Exports DC, OAI_DC formats for use in OAI/PMH servers.
  • Roda - RODA - Repository of Authentic Digital Objects
  • Rosetta - Ex Libris Rosetta enables institutions to preserve and provide access to the collections in their care.
  • SobekCM - SobekCM is a digital repository and digital scholarship/publishing system which enables easy deposit, preservation, and access for all types of digital content, tailored to the needs of galleries, libraries, archives, museums, scholars, and researchers.
  • The Dataverse Network Project - The Dataverse Network Project is an open-source application for publishing, citing and discovering research data.
  • The Hydra Project - Hydra is a multi-institutional, multi-functional, multi-purpose, technical and community framework.
  • The Open Video Digital Library Toolkit - The Open Video Digital Library Toolkit project is intended to provide museums, libraries and other institutions holding moving image collections tools to more easily create Web-based digital video libraries.
  • XArch - XArch is an archive management system that allows one to create, populate, and query archives of multiple database versions.

Service

  • Amazon Cloud - Amazon Cloud is an internet-based storage location designed to hold files indefinitely.
  • Archive-It - Archive-It is the leading web archiving service for collecting and accessing cultural heritage on the web. It is a service provided by the Internet Archive.
  • Carbonite - an online backup service that automatically backs up documents, e-mails, music, photos, and settings. Info gathered early March 2013.
  • Chronopolis - "Chronopolis digital preservation network provides services for the long-term preservation and curation of America's digital holdings"
  • DMPonline - DMPonline is the DCC's data management planning tool.
  • Dropbox - Dropbox is a free service that lets you bring all your photos, docs, and videos anywhere. This means that any file you save to your Dropbox will automatically save to all your computers, phones and even the Dropbox website. Dropbox also makes it super easy to share with others, whether you're a student or professional, parent or grandparent. Even if you accidentally spill a latte on your laptop, have no fear! You can relax knowing that Dropbox always has you covered, and none of your stuff will ever be lost.
  • DuraCloud - DuraCloud is a hosted service that provides a centralised interface for organizations interested in using cloud storage as a part of their digital archiving and preservation programs.
  • Glacier (Amazon) - Amazon Glacier is a secure, durable, and extremely low-cost cloud storage service for data archiving and long-term backup.
  • Google Cloud - Google Cloud Storage allows users to store, access, and manage their data.
  • NESSTAR - Nesstar suite is an online publishing platform for organisations wishing to share datasets both internally and with the wider web.
  • Preservica - Preservica is a complete OAIS Digital Preservation system available on the cloud (hosted in US, EU or AUS) and on premise (Standard and Enterprise versions). It is trusted by over 50 organisations across 4 continents to preserve collections both large (>6Pb) and small (few 100kb)
  • RackSpace - RackSpace provices cloud based services to businesses of all sizes through the world.
  • WAS (Web Archiving Service) - The Web Archiving Service (WAS) is a Web-based curatorial tool that enables libraries and archivists to capture, curate, analyze, and preserve Web-based government and political information.

Version Control

Tools that support the tracking of changes to digital files over time.

Workflow

Tools that support the orchestration and management of specific tools or processes in a workflow.

  • File Analyzer and Metadata Harvester V2 - The File Analyzer is a general purpose desktop (and command line) tool designed to automate simple, file-based operations. The File Analyzer assembles a toolkit of tasks a user can perform. The tasks that have been written into the File Analyzer code base have been optimized for use by libraries, archives, and other cultural heritage institutions.
  • MyExperiment - myExperiment is an online social networking service aimed at scientific researchers; the site fosters collaboration by allowing members to share scientific workflows, experiment plans, and other digital objects.
  • Taverna - Taverna is a scientific workflow management system designed to assemble, run, document and share sequences sequences of web services and scripts.

Dispose

Functions that support the DCC Lifecycle stage defined as "Dispose of data, which has not been selected for long-term curation and preservation in accordance with documented policies, guidance or legal requirements. Typically data may be transferred to another archive, repository, data centre or other custodian. In some instances data is destroyed. The data's nature may, for legal reasons, necessitate secure destruction."

Redaction

Tools that support the removal of selected information from digital files. Typically used for removal of sensitive information like telephone or credit card numbers from personal archives before providing access to users.

  • MRU-Blaster - MRU-Blaster is a program made to do one large task - detect and clean MRU (most recently used) lists on your computer.
  • Microsoft Office 2003 Add-in: Word Redaction v1.2 - Use the Word 2003 Redaction Add-in to hide text within Microsoft Office Word 2003 documents.
  • Microsoft Office 2003/XP Add-in: Remove Hidden Data - With this add-in you can permanently remove hidden data and collaboration data, such as change tracking and comments, from Microsoft Word, Microsoft Excel, and Microsoft PowerPoint files.
  • RapidRedact - The RapidRedact product range provides fast, easy to use redaction tools for irreversibly blanking out (redacting) selected information, author's changes and hidden data from all electronic document types.
  • Redact-It - Provides Windows desktop and server redaction of PDF, Word, scanned TIFF images. Find, black out and remove content within documents, images or drawings.
  • Redax - Redax completely redacts (removes) text and graphics from the PDF page.

Secure Deletion

Tools that support deletion of data in a way that cannot be reversed, typically to avoid third parties stealing sensitive information from decommissioned or recycled hardware.

  • BCWipe - BCWipe data wiping software enables you to permanently delete selected files so that they can never be recovered or undeleted.
  • CCleaner - CCleaner is a tool for cleaning Windows PCs.
  • Darik's Boot And Nuke - Darik's Boot and Nuke ("DBAN") is a self-contained boot disk that securely wipes the hard disks of most computers.
  • Disk Utility - In Disk Utility in Mac OS X 10.
  • Eraser - Eraser is an advanced security tool for Windows which allows you to completely remove sensitive data from your hard drive by overwriting it several times with carefully selected patterns.
  • Ontrack Eraser Software - Ontrack Eraser software is an easy-to-use, highly flexible data erasure tool that erases all traces of data stored on a targeted media - ensuring that sensitive information does not fall into the wrong hands.
  • PDWIPE (Physical Drive WIPE) - PDWIPE (Physical Drive WIPE) is a standalone DOS utility to wipe (zero) an entire physical hard drive.
  • SDelete v1.51 - SDelete is a command line utility that takes a number of options.
  • Secure Deletion - Secure deletion involves the use of special software to ensure that when you delete a file, there really is no way to get it back again.

Ingest

Functions that support the DCC Lifecycle stage defined as "Transfer data to an archive, repository, data centre or other custodian. Adhere to documented guidance, policies or legal requirements."

Dependency Analysis

Tools for identifying essential information that resides externally to a digital object, or for identifying dependent processes such as which DLLs are required by a Windows process.

  • Dependency Discovery Tool - The Dependency Discovery Tool searches through binary office files (.doc, .xls and .ppt) and tries to find any documents or files that are linked to the document.
  • Nuclear Processor - Process/module manager for Windows, with features such as Kill/Resume/Suspend thread of a process and unload DLL files
  • PDF Tools (by Didier Stevens) - Tools for parsing and analysing PDF documents
  • PERICLES Extraction Tool (PET) - A tool to capture contextual information in a sheer curation scenario

Encryption Detection

Tools that support the detection of encryption or password protection in files.

  • Apache PDFBox - JAVA PDF library for creation, manipulation, validation and content extraction of PDF documents
  • Apache POI - the Java API for Microsoft Documents - The Apache POI Project's mission is to create and maintain Java APIs for manipulating various file formats based upon the Office Open XML standards (OOXML) and Microsoft's OLE 2 Compound Document format (OLE2).
  • EpubCheck - Validator for EPUB files
  • FITS (File Information Tool Set) - FITS allows data curators to identify, validate, and extract technical metadata for the objects in their digital repository.
  • Flint - Validates a file against a policy, using common validation tools
  • JHOVE (Harvard Object Validation Environment) - JHOVE provides functions to perform format-specific identification, validation, and characterization of digital objects.
  • JHOVE2 - JHOVE2 allows data curators to characterise the digital objects in their repositories.

File Format Identification

Tools that enable the automatic identification of the file format of a particular file, typically by examining characteristic codes (often termed file format magic) in the file header.

  • Apache Tika - Java based tool for identifying file formats using signatures and extracting metadata and text content from documents.
  • DROID (Digital Record Object Identification) - DROID (Digital Record Object Identification) is a software tool developed to perform automated batch identification of file formats.
  • DUMPBIN Utility - The DUMPBIN utility, which is provided with the 32-bit version of Microsoft Visual C++, combines the abilities of the LINK, LIB, and EXEHDR utilities.
  • FIDO (Format Identification for Digital Objects) - A PRONOM based, command line, file format identification tool written in Python
  • FIDOO - A PRONOM based, online file format identification tool written in Javascript and HTML5
  • FITS (File Information Tool Set) - FITS allows data curators to identify, validate, and extract technical metadata for the objects in their digital repository.
  • Fine Free File Command - This is the home page for the open source implementation of the file(1) command that ships with every free operating system (OpenBSD, Linux, NetBSD, FreeBSD, etc.
  • Gvfs-info - gvfs-info - print information about files and directories
  • JHOVE (Harvard Object Validation Environment) - JHOVE provides functions to perform format-specific identification, validation, and characterization of digital objects.
  • JHOVE2 - JHOVE2 allows data curators to characterise the digital objects in their repositories.
  • Libmagic-dev - This library can be used to classify files according to magic number tests.
  • Libsharedmime - This is an implementation for libsharedmime.
  • Media conch - Media Conch is a implementation checker, policy checker and fixer for audiovisual files with focus on Matroska, LPCM and FFV1.
  • NARA File Analyzer and Metadata Harvester - NARA File Analyzer and Metadata Harvester allows a user to analyze the contents of a file system or external drive and generates statistics about the contents of the contained directories.
  • Nanite - A friendly swarm of format-identifying robots
  • Officeparser.py - officerparser.py is a python script that parses the format of OLE compound documents used by Microsoft Office applications.
  • Ohcount - Analyses plain text files, looking for code (scripting languages etc.)
  • PRONOM Signature Development Utility - Output DROID compatible file format signature files using PRONOM syntax
  • Siegfried - A PRONOM based, command line, file format identification tool using Aho Corasick matching and no buffer limits.
  • TrID File Identifier - TrID is a utility designed to identify file types from their binary signatures.
  • Web Archive Discovery - Indexing and discovery tools for web archives.

Fixity

Tools that support the verification of file fixity, typically through the generation and validation of checksum based manifests.

  • ACE (Audit Control Environment) - The Auditing Control Environment is a mature set of software designed to help libraries and archives prove their holdings are intact and trustworthy.
  • BIL (BagIt Library) - BagIt Library is a Java software library that supports the creation, manipulation and validation of bags.
  • BagIt Transfer Utilities - BagIt transfer Utilities are a collection of tools developed for the purpose of validation and transfer of bags.
  • Bagger - GUI application to facilitate the creation and verification of BagIt bags.
  • Cksum Unix command - cksum computes a cyclic redundancy check (CRC) checksum for each given file, or standard input if none are given
  • File Analyzer and Metadata Harvester V2 - The File Analyzer is a general purpose desktop (and command line) tool designed to automate simple, file-based operations. The File Analyzer assembles a toolkit of tasks a user can perform. The tasks that have been written into the File Analyzer code base have been optimized for use by libraries, archives, and other cultural heritage institutions.
  • FileVerifier++ - Windows utility for verifying file contents
  • Fixi - Fixi is a command-line utility that indexes, verifies, and updates checksum information for collections of files.
  • Fixity - Fixity monitoring for small-medium collections
  • Md5deep and hashdeep - md5deep is a set of programs to compute MD5, SHA-1, SHA-256, Tiger, or Whirlpool message digests on an arbitrary number of files. hashdeep is a program to compute, match, and audit hashsets.
  • Md5sum Unix command - md5sum computes a 128-bit checksum (or fingerprint or message-digest) for each specified file.
  • Md5summer - MD5summer is an application for Microsoft Windows 9x, NT, ME, 2000 and XP which generates and verifies md5 checksums.
  • NARA File Analyzer and Metadata Harvester - NARA File Analyzer and Metadata Harvester allows a user to analyze the contents of a file system or external drive and generates statistics about the contents of the contained directories.
  • Python checkm package - This is a Python implementation of the checkm specification.
  • Rhash - RHash (Recursive Hasher) is a console utility for computing and verifying hash sums of files.
  • SAFE Archive Audit System - Policy-based replication and Auditing of LOCKSS networks.
  • SSDeep - Recursive piecewise hashing tool

Metadata Extraction

Tools that support the extraction of metadata from files.

  • Apache PDFBox - JAVA PDF library for creation, manipulation, validation and content extraction of PDF documents
  • Apache POI - the Java API for Microsoft Documents - The Apache POI Project's mission is to create and maintain Java APIs for manipulating various file formats based upon the Office Open XML standards (OOXML) and Microsoft's OLE 2 Compound Document format (OLE2).
  • Apache Tika - Java based tool for identifying file formats using signatures and extracting metadata and text content from documents.
  • BWF MetaEdit - BWF MetaEdit permits embedding, validating, and exporting of metadata in Broadcast WAVE Format (BWF) files.
  • Brunnhilde - Siegfried-based characterization of directories and disk images
  • C3PO - C3PO is a content profiling tool for visualization and preservation analysis
  • DROID (Digital Record Object Identification) - DROID (Digital Record Object Identification) is a software tool developed to perform automated batch identification of file formats.
  • DROID sqlite analysis - Analysis and automatic generation of summary information from DROID output
  • DUMPBIN Utility - The DUMPBIN utility, which is provided with the 32-bit version of Microsoft Visual C++, combines the abilities of the LINK, LIB, and EXEHDR utilities.
  • Disktype - Tool for detecting the content format of a disk or disk image. It knows about common file systems, partition tables, and boot codes.
  • EMET (Embedded Metadata Extraction Tool) - EMET is a stand-alone tool designed to extract metadata embedded in JPEG and TIFF files.
  • EPADD - ePADD is a software package developed by Stanford University's Special Collections & University Archives that supports archival processes around the appraisal, ingest, processing, discovery, and delivery of email archives.
  • EXE Explorer - EXE Explorer reads and displays executable file properties and structure.
  • EXIF to DC XML normaliser - Extract EXIF data and normalise it to DC XML.
  • Easy CD-DA Extractor - Easy CD-DA Extractor is CD Ripper, Music Converter, Audio Converter, Metadata Editor, and CD/DVD burning software.
  • EpubCheck - Validator for EPUB files
  • Exempi - Exempi is a library for handling XMP metadata, based on the Adobe XMP SDK
  • ExifTool - Properties extraction, identification, metadata editing
  • FIDO (Format Identification for Digital Objects) - A PRONOM based, command line, file format identification tool written in Python
  • FIDOO - A PRONOM based, online file format identification tool written in Javascript and HTML5
  • FITS (File Information Tool Set) - FITS allows data curators to identify, validate, and extract technical metadata for the objects in their digital repository.
  • File Analyzer and Metadata Harvester V2 - The File Analyzer is a general purpose desktop (and command line) tool designed to automate simple, file-based operations. The File Analyzer assembles a toolkit of tasks a user can perform. The tasks that have been written into the File Analyzer code base have been optimized for use by libraries, archives, and other cultural heritage institutions.
  • FileAlyzer - FileAlyzer allows a basic analysis of files (showing file properties and file contents in hex dump form) and is able to interpret common file contents like resources structures (like text, graphics, HTML, media and PE).
  • GNU libextractor - GNU libextractor is a library used to extract meta data from files of arbitrary type.
  • GetID3() - Extracts technical and embedded descriptive metadata from common multimedia file formats.
  • IText - PDF library for manipulation, content extraction and creation
  • Index.dat Analyzer v2.5 - Index.dat Analyzer is a tool to view, examine and delete contents of index.dat files.
  • JHOVE (Harvard Object Validation Environment) - JHOVE provides functions to perform format-specific identification, validation, and characterization of digital objects.
  • JHOVE2 - JHOVE2 allows data curators to characterise the digital objects in their repositories.
  • JWAT - Java Web Archive Toolkit
  • Jp2StructCheck - Simple JP2 file structure checker
  • Jpylyzer - JP2 validation + properties extraction
  • Keith Humphreys' PhraseRate - PhraseRate is a program, developed by Keith Humphreys, for extracting a set of meaningful, attractive keywords and key phrases from a web page describing the content of that page.
  • Lingfo - Lingfo provides a library for developers to use to extract information from Microsoft Excel spreadsheet files.
  • MP3::Tag - MP3::Tag is a module for reading tags of MP3 audio files.
  • Mdqc - Tool for managing and comparing digital asset metadata
  • MediaInfo - Supplies technical and tag information about a video or audio file.
  • Metadata Extraction Tool - Metadata Extraction Tool automatically extracts a limited set of metadata from the headers of digital files.
  • NARA File Analyzer and Metadata Harvester - NARA File Analyzer and Metadata Harvester allows a user to analyze the contents of a file system or external drive and generates statistics about the contents of the contained directories.
  • NARA Video Frame Analyzer - NARA Video Frame Analyzer analyzes technical properties of individual frames of a video file in order to detect quality issues within digitized video files.
  • Nanite - A friendly swarm of format-identifying robots
  • ODF Validator - ODF Validator is a tool that validates OpenDocument files and checks them for certain conformance criteria.
  • Officeparser.py - officerparser.py is a python script that parses the format of OLE compound documents used by Microsoft Office applications.
  • OpenJPEG - The OpenJPEG library is an open-source JPEG 2000 codec written in C language.
  • PDF Tools (by Didier Stevens) - Tools for parsing and analysing PDF documents
  • PERICLES Extraction Tool (PET) - A tool to capture contextual information in a sheer curation scenario
  • Pagelyzer - Suite of tools for detecting changes in web pages and their rendering
  • Pdftk - PDF manipulation tool
  • Peepdf - peepdf is a Python tool to explore PDF files in order to find out if the file can be harmful or not.
  • Python XMP Toolkit - Library for working with XMP metadata, as well as reading/writing XMP metadata stored in many different file formats
  • Qpdf - QPDF is a command-line program that does structural, content-preserving transformations on PDF files
  • Warctools - Command line tools and libraries for handling and manipulating WARC files (and HTTP contents)
  • Web Archive Discovery - Indexing and discovery tools for web archives.
  • WordHoard - WordHoard is an application for the close reading and scholarly analysis of deeply tagged texts.
  • Xpdf - Open source PDF viewer that includes PDF information extractor and font analyzer

Metadata Processing

Tools that support the processing or management of metadata.

  • ArchivesSpace - ArchivesSpace is the next-generation web-based archives information management system, designed by archivists and supported by diverse archival repositories.
  • Archivists' Toolkit - The Archivists? Toolkit?, or the AT, is the first open source archival data management system to provide broad, integrated support for the management of archives.
  • Archon - Archon automatically publishes archival descriptive information and digital archival objects in a user-friendly website.
  • BWF MetaEdit - BWF MetaEdit permits embedding, validating, and exporting of metadata in Broadcast WAVE Format (BWF) files.
  • CSV Validator - Validation of CSV files against user-defined schema
  • Collectus -- A Digital Object Collector Tool - The UVa Library's Collectus digital object collector tool allows users to to collect image or text objects from a repository.
  • ContextMiner - ContextMiner is a framework to collect, analyze, and present the contextual information along with the data.
  • Curator's Workbench - Curator's Workbench is a tool that automates and streamlines the process of preparing collections of digital materials for submission to a repository
  • DV Analyzer - DV Analyzer is a technical quality control and reporting tool that examines DV streams in order to report errors in the tape-to-file transfer process.
  • Duke Data Accessioner - Data Accessioner provides a graphical user interface to aid in migrating data from physical media to a dedicated file server, documenting the process and using MD5 checksums to identify any errors introduced in transfer.
  • EPADD - ePADD is a software package developed by Stanford University's Special Collections & University Archives that supports archival processes around the appraisal, ingest, processing, discovery, and delivery of email archives.
  • Exempi - Exempi is a library for handling XMP metadata, based on the Adobe XMP SDK
  • ExifTool - Properties extraction, identification, metadata editing
  • Exiv2 - Exiv2 is a C++ library and a command line utility to manage image metadata.
  • File Analyzer and Metadata Harvester V2 - The File Analyzer is a general purpose desktop (and command line) tool designed to automate simple, file-based operations. The File Analyzer assembles a toolkit of tasks a user can perform. The tasks that have been written into the File Analyzer code base have been optimized for use by libraries, archives, and other cultural heritage institutions.
  • Gumshoe - Search interface for metadata extracted from forensic disk images.
  • ICA-AtoM - ICA-AtoM allows organisations to create standards-based descriptions of their archival holdings and subsequently publish them to the Web.
  • ImageVerifier - ImageVerifier (IV for short) traverses a hierarchy of folders looking for image files to verify. It can verify TIFFs, JPEGs. PSDs, DNGs, and non-DNG raws (e.g., NEF, CR2).
  • Karen's Directory Printer - Karen's Directory Printer can print the name of every file on a drive, along with the file's size, date and time of last modification, and attributes (Read-Only, Hidden, System and Archive).
  • Mdqc - Tool for managing and comparing digital asset metadata
  • NESSTAR - Nesstar suite is an online publishing platform for organisations wishing to share datasets both internally and with the wider web.
  • OpenWMS: Workflow Management System for Digital Objects - The OpenWMS is a platform-independent, open source, web-accessible system that can be used as a standalone application or integrated with other repository architectures by a wide range of organizations.
  • PAIRTREE Library - software library that supports the mapping between identifiers and filepaths according to the Pairtree Curation Microservices Specification.
  • PREMIS in METS (PiM) Toolbox - PREMIS in METS Toolbox was developed to support the implementation of PREMIS in the METS container format.
  • Python XMP Toolkit - Library for working with XMP metadata, as well as reading/writing XMP metadata stored in many different file formats
  • ReDBox - ReDBox and Mint are two complimentary applications designed to create, store, and provide access to research metadata.
  • Rosetta - Ex Libris Rosetta enables institutions to preserve and provide access to the collections in their care.
  • SobekCM - SobekCM is a digital repository and digital scholarship/publishing system which enables easy deposit, preservation, and access for all types of digital content, tailored to the needs of galleries, libraries, archives, museums, scholars, and researchers.
  • Tree - Tree displays the directory structure of a path or of the disk in a drive graphically.
  • USGS Formal metadata: information and software - This page links to information and tools from the USGS.
  • Voyeur - Voyeur is a web-based text analysis environment that can use texts in a variety of formats, from different locations to perform lexical analysis, export data to other tools, and embed live tools into remote websites.
  • WCT (Web Curator Tool) - Web Curator Tool (WCT) is a workflow management application for selective web archiving.
  • XMP metadata support in JabRef - With XMP support the JabRef team tries to bring the advantages of metadata to the world of reference managers.

Persistent Identification

Tools that support the unique and persistent identification of files or intellectual entities.

  • DataCite - DataCite works with data centres to assign persistent identifiers to datasets using the Digital Object Identifier (DOI) infrastructure.
  • EZID - EZID (easy-eye-dee) makes it easy to create and manage unique, persistent identifiers.
  • WebCite - WebCite is an on-demand web archiving service that takes snapshots of Internet-accessible digital objects at the behest of users, storing the data on their own servers and assigning unique identifiers to those instances of the material.

Quality Assurance

Tools that support quality checking of digital resources, identifying damaged, incomplete or low quality data. Typically used to identify damage introduced via processes such as format migration or digitisation.

  • AsTiffTagViewer - AsTiffTagViewer is a TIFF Tag Viewer application.
  • Bad Peggy - Scans for damaged images and photos.
  • Checkit tiff - a tool to validate TIFF files against given configuration profile
  • DV Analyzer - DV Analyzer is a technical quality control and reporting tool that examines DV streams in order to report errors in the tape-to-file transfer process.
  • File Analyzer and Metadata Harvester V2 - The File Analyzer is a general purpose desktop (and command line) tool designed to automate simple, file-based operations. The File Analyzer assembles a toolkit of tasks a user can perform. The tasks that have been written into the File Analyzer code base have been optimized for use by libraries, archives, and other cultural heritage institutions.
  • Fingerdet - QA tool for detecting fingers on digitised pages
  • GNU Diffutils - GNU Diffutils is a package of several programs related to finding differences between files.
  • ImageVerifier - ImageVerifier (IV for short) traverses a hierarchy of folders looking for image files to verify. It can verify TIFFs, JPEGs. PSDs, DNGs, and non-DNG raws (e.g., NEF, CR2).
  • Jp2StructCheck - Simple JP2 file structure checker
  • Jpylyzer - JP2 validation + properties extraction
  • KOST-Simy - The KOST-Simy application is used for Compare Images.
  • KOST-Val - KOST-Val is an open source validator for different file formats (TIFF, SIARD, PDF/A, JP2, JPEG) and Submission Information Package (SIP).
  • MP3val - MP3val is a small, high-speed, free software tool for checking MPEG audio files' integrity.
  • Matchbox Tool - Matchbox: Duplicate detection tool for digital document collections.
  • Mdqc - Tool for managing and comparing digital asset metadata
  • NARA Video Frame Analyzer - NARA Video Frame Analyzer analyzes technical properties of individual frames of a video file in order to detect quality issues within digitized video files.
  • Pagelyzer - Suite of tools for detecting changes in web pages and their rendering
  • Qctools - Analyse digital video and detect corruption/artefacts
  • ReACT (Resource Audit and Comparison Tool) - A file audit and comparison tool using Microsoft Excel and VBA.
  • SIARD-VAL - SIARD-Val is an open source validator for SIARD files.
  • SIARDexcerpt - SIARDexcerpt is a Java-based application that searches and extracts individual records of SIARD files.
  • SobekCM - SobekCM is a digital repository and digital scholarship/publishing system which enables easy deposit, preservation, and access for all types of digital content, tailored to the needs of galleries, libraries, archives, museums, scholars, and researchers.
  • TIFF-Val - TIFF-Val is an open source validator for TIFF files.
  • Web Application Testing with iMacros - iMacros makes it easy to test web-based applications.
  • XcorrSound - The xcorrSound package compares sound waves using cross correlation.

Validation

Tools that support the validation of digital files, typically against a file format specification.

  • 3-Heights(TM) PDF Validator - 3-Heights(TM) PDF Validator from PDF-Tools AG.
  • Apache PDFBox - JAVA PDF library for creation, manipulation, validation and content extraction of PDF documents
  • BWF MetaEdit - BWF MetaEdit permits embedding, validating, and exporting of metadata in Broadcast WAVE Format (BWF) files.
  • Bad Peggy - Scans for damaged images and photos.
  • CSV Validator - Validation of CSV files against user-defined schema
  • Checkit tiff - a tool to validate TIFF files against given configuration profile
  • EpubCheck - Validator for EPUB files
  • FITS (File Information Tool Set) - FITS allows data curators to identify, validate, and extract technical metadata for the objects in their digital repository.
  • File Analyzer and Metadata Harvester V2 - The File Analyzer is a general purpose desktop (and command line) tool designed to automate simple, file-based operations. The File Analyzer assembles a toolkit of tasks a user can perform. The tasks that have been written into the File Analyzer code base have been optimized for use by libraries, archives, and other cultural heritage institutions.
  • Flint - Validates a file against a policy, using common validation tools
  • ImageVerifier - ImageVerifier (IV for short) traverses a hierarchy of folders looking for image files to verify. It can verify TIFFs, JPEGs. PSDs, DNGs, and non-DNG raws (e.g., NEF, CR2).
  • JHOVE (Harvard Object Validation Environment) - JHOVE provides functions to perform format-specific identification, validation, and characterization of digital objects.
  • JHOVE2 - JHOVE2 allows data curators to characterise the digital objects in their repositories.
  • JWAT - Java Web Archive Toolkit
  • Jp2StructCheck - Simple JP2 file structure checker
  • Jpylyzer - JP2 validation + properties extraction
  • KOST-Simy - The KOST-Simy application is used for Compare Images.
  • KOST-Val - KOST-Val is an open source validator for different file formats (TIFF, SIARD, PDF/A, JP2, JPEG) and Submission Information Package (SIP).
  • MP3val - MP3val is a small, high-speed, free software tool for checking MPEG audio files' integrity.
  • Media conch - Media Conch is a implementation checker, policy checker and fixer for audiovisual files with focus on Matroska, LPCM and FFV1.
  • ODF Validator - ODF Validator is a tool that validates OpenDocument files and checks them for certain conformance criteria.
  • PDF Tools (by Didier Stevens) - Tools for parsing and analysing PDF documents
  • PDFTron PDF-A Manager - PDF/A Manager is a PDF/A (ISO 19005) validation and conversion software.
  • PREMIS in METS (PiM) Toolbox - PREMIS in METS Toolbox was developed to support the implementation of PREMIS in the METS container format.
  • SIARD-VAL - SIARD-Val is an open source validator for SIARD files.
  • TIFF-Val - TIFF-Val is an open source validator for TIFF files.
  • VeraPDF - PDF/A validation tool
  • W3C Markup Validation Service - This is the World Wide Web Consortium's validation tool.
  • Warctools - Command line tools and libraries for handling and manipulating WARC files (and HTTP contents)

Preservation Action

Functions that support the DCC Lifecycle stage defined as "Undertake actions to ensure long-term preservation and retention of the authoritative nature of data. Preservation actions should ensure that data remains authentic, reliable and usable while maintaining its integrity. Actions include data cleaning, validation, assigning preservation metadata, assigning representation information and ensuring acceptable data structures or file formats."

De-Duplication

Tools that enable the identification and/or removal of duplicate or similar files.

  • DROID sqlite analysis - Analysis and automatic generation of summary information from DROID output
  • Emailchemy - Converts proprietary emails to standard portable formats
  • FileVerifier++ - Windows utility for verifying file contents
  • Fslint - Set of utilities to find and clean various forms of lint on a filesystem, such as duplicate files, empty directories, and bad file names.
  • GNU Diffutils - GNU Diffutils is a package of several programs related to finding differences between files.
  • Java library implementing Pairtree - The PAIRTREE LIBRARY is a software library that supports the mapping between identifiers and filepaths according to the Pairtree Specification.
  • Matchbox Tool - Matchbox: Duplicate detection tool for digital document collections.
  • SSDeep - Recursive piecewise hashing tool
  • The DeDuplicator (Heritrix add-on module) - The DeDuplicator is an add-on module for Heritrix to reduce the amount of duplicate data collected in a series of snapshot crawls.
  • XcorrSound - The xcorrSound package compares sound waves using cross correlation.

Decryption

Tools for recovering passwords or unlocking encrypted digital files.

  • AccessData Decryption Tools - This page gives information on AccessData Decryption Tools.
  • ElcomSoft - ElcomSoft offers numerous password recovery applications.
  • Password Recovery Software - Passware software recovers or resets passwords for Windows, Word , Excel, QuickBooks, Access, Acrobat, and more than 180 document types.
  • Qpdf - QPDF is a command-line program that does structural, content-preserving transformations on PDF files

Emulation

Tools that enable the emulation or virtualisation of a hardware or software system on another system.

  • Dioscuri - Dioscuri is a computer hardware emulator, specifically designed to be used as part of a digital preservation strategy.
  • IBM Digital Asset Preservation Tool - IBM's Digital Asset Preservation Tool is a proof-of-concept demonstration of the Universal Virtual Computer solution that provides long-term access to JPEG and GIF87a files.
  • JPC - JPC is the fast pure Java x86 PC emulator.
  • KEEP Emulation Framework - KEEP Emulation Framework (EF) allows users to view and interact with digital files that otherwise would require obsolete hardware and software.
  • Kernel-based virtual machine - KVM (for Kernel-based Virtual Machine) is a full virtualization solution for Linux on x86 hardware containing virtualization extensions (Intel VT or AMD-V).
  • Linux-VServer - Linux-VServer provides virtualization for GNU/Linux systems.
  • OpenVZ wiki - OpenVZ is container-based virtualization for Linux.
  • Recompute - Automatically generates "playable" virtual machines from source code on github
  • VMware Player - VMware Player is the easiest way to run multiple operating systems at the same time on your PC.
  • VirtualBox - VirtualBox is a powerful x86 and AMD64/Intel64 virtualization product for enterprise as well as home use.
  • Windows Virtual PC - Windows XP Mode and Windows Virtual PC, available on Windows 7 Professional and Windows 7 Ultimate, allow you to run multiple Windows environments, such as Windows XP Mode, from your Windows 7 desktop.
  • Wine - Wine lets you run Windows software on other operating systems.
  • Xen - The Xen hypervisor, the powerful open source industry standard for virtualization, offers a powerful, efficient, and secure feature set for virtualization of x86, x86_64, IA64, ARM, and other CPU architectures.

File Format Migration

Tools that support the transformation of data from one file format to another.

  • AccessToSiard - A collection of scripts to automatically convert MS Access files to the SIARD format.
  • Antiword - Antiword is a free MS Word reader for Linux and RISC OS.
  • Apache PDFBox - JAVA PDF library for creation, manipulation, validation and content extraction of PDF documents
  • Apache POI - the Java API for Microsoft Documents - The Apache POI Project's mission is to create and maintain Java APIs for manipulating various file formats based upon the Office Open XML standards (OOXML) and Microsoft's OLE 2 Compound Document format (OLE2).
  • Archivematica - Archivematica is a digital preservation system that automates the process of preparing digital objects for ingest into a repository and an access system
  • Audio/Video to WAV Converter - This tool converts audio and video files to WAV format.
  • CDS Convert - CDS Convert is a suite of tools that allow conversion of documents, presentations and images between different software formats.
  • CHRONOS - Database Retirement, Partial and Ongoing Database Archiving, Application Retirement.
  • CSV2SIARD - A tool to create SIARD containers from CSV files.
  • Calibre - An e-book management tool, including viewer, migration, and file conversion features among others.
  • Catdoc & xls2csv - catdoc is a program that reads one or more Microsoft Word files and outputs text to standard output.
  • DANS (Data Archiving and Networked Services) DBF - DANS DBF Library is a Java library for reading and writing xBase database files.
  • DANS MIXED - Migration to Intermediate XML for Electronic Data.
  • DBpoweramp Music Converter (dMC) - dBpoweramp Music Converter (dMC) is an audio conversion tool.
  • Db-preservation-toolkit - Enables conversion between database formats or dumping from live database systems for the purposes of preservation.
  • DeepArc - Intended for preserving web sites from the back-end, this is a database-to-XML curation tool.
  • DocMorph: Electronic Document Conversion - The U.S. National Library of Medicine's (NLM) document conversion tools make the exchange and use of biomedical library electronic information easier for librarians, library users, and the general public
  • EXIF to DC XML normaliser - Extract EXIF data and normalise it to DC XML.
  • Easy CD-DA Extractor - Easy CD-DA Extractor is CD Ripper, Music Converter, Audio Converter, Metadata Editor, and CD/DVD burning software.
  • Emailchemy - Converts proprietary emails to standard portable formats
  • FFmpeg - *FFmpeg* is a complete, cross-platform solution to record, convert and stream audio and video.
  • ImageMagick - ImageMagick® is a software suite to create, edit, compose, or convert bitmap images.
  • JJ2000 - Pure Java implementation of a JPEG2000 decoder
  • JWAT - Java Web Archive Toolkit
  • Kakadu - JPEG 2000 SDK, includes encoder/decoder
  • Lingfo - Lingfo provides a library for developers to use to extract information from Microsoft Excel spreadsheet files.
  • LuraDocument PDF Compressor - LuraDocument PDF Compressor is a document conversion engine.
  • MIXED (Migration to Intermediate XML for Electronic Data) - MIXED (Migration to Intermediate XML for Electronic Data) is a web service that converts tabular data files such as spreadsheets and databases to the Standard Data Format for Preservation (SDFP), a supplier-independent XML format.
  • MPG321 - mpg321 is a command-line mp3 player. mpg321 is used for frontends, as an mp3 player and as an mp3 to wave file decoder.
  • MPP Viewer - MPP Viewer is a viewer for Microsoft Project files
  • MSIL Disassembler (Ildasm.exe) - The MSIL Disassembler is a companion tool to the MSIL Assembler (Ilasm.
  • Open Office - OpenOffice.org 3 is the leading open-source office software suite for word processing, spreadsheets, presentations, graphics, databases and more.
  • Open Video Converter - This tool is for video conversion, splitting and editing.
  • OpenJPEG - The OpenJPEG library is an open-source JPEG 2000 codec written in C language.
  • OpenXML/ODF Translator Add-in for Office - The goal for this project is to provide translators to allow for interoperability between applications based on ODF (OpenDocument) 1.
  • Oracle Outside In Technology - Outside In Technology is a suite of software development kits (SDKs) that provides developers with a comprehensive solution to access, transform and control the contents of over 500 unstructured file formats.
  • PDFTron PDF-A Manager - PDF/A Manager is a PDF/A (ISO 19005) validation and conversion software.
  • PREMIS in METS (PiM) Toolbox - PREMIS in METS Toolbox was developed to support the implementation of PREMIS in the METS container format.
  • RODA DBML - Migrates databases to an XML schema, DBML. Can then provide access by dumping DBML to MySQL and showing it in phpMyAdmin.
  • Rosetta - Ex Libris Rosetta enables institutions to preserve and provide access to the collections in their care.
  • SIARD Suite - SIARD Suite is a freeware tool for the conversion of contents of relations databases into the SIARD format.
  • Ssconvert - ssconvert is a command line utility to convert spreadsheet files between various spreadsheet file formats.
  • WMDecode - WMDecode is used for extracting files from winmail.
  • Xena - Detecting the file formats of digital objects; converting digital objects into open formats for preservation.

File Management

Tools that support general file management activities such as viewing or renaming

  • BAT: BnfArcTools - BAT is a Perl package for processing Internet Archive ARC, DAT and CDX file format.
  • Bulk Rename Utility - Bulk Rename Utility is a free file renaming software for Windows. Bulk Rename Utility allows you to easily rename files and entire folders based upon extremely flexible criteria.
  • Dcfldd - dcfldd is an enhanced version of GNU dd with features useful for forensics and security.
  • DiskView - DiskView shows you a graphical map of your disk, allowing you to determine where a file is located or, by clicking on a cluster, seeing which file occupies it.
  • Emailchemy - Converts proprietary emails to standard portable formats
  • Explore2fs - Explore2fs is a GUI explorer tool for accessing ext2 and ext3 filesystems.
  • File Analyzer and Metadata Harvester V2 - The File Analyzer is a general purpose desktop (and command line) tool designed to automate simple, file-based operations. The File Analyzer assembles a toolkit of tasks a user can perform. The tasks that have been written into the File Analyzer code base have been optimized for use by libraries, archives, and other cultural heritage institutions.
  • Fslint - Set of utilities to find and clean various forms of lint on a filesystem, such as duplicate files, empty directories, and bad file names.
  • Java library implementing Pairtree - The PAIRTREE LIBRARY is a software library that supports the mapping between identifiers and filepaths according to the Pairtree Specification.
  • ReACT (Resource Audit and Comparison Tool) - A file audit and comparison tool using Microsoft Excel and VBA.
  • ReNamer - ReNamer is a very powerful and flexible file renaming tool.
  • The Rename - bulk renaming of files - Bulk renaming of files - free downloadable software
  • TreeSize Professional - disk space management software - Manage disk space and scan your hard disks.

File Recovery

Tools that support the recovery of data from damaged or corrupted storage devices such as disks.

  • Dd rescue - dd_rescue is suitable for rescuing data from a medium with errors, i.
  • Emailchemy - Converts proprietary emails to standard portable formats
  • Foremost - Foremost is a console program to recover files based on their headers, footers, and internal data structures.
  • GetDataBack - GetDataBack will recover your data if the hard drive's partition table, boot record, FAT/MFT or root directory are lost or damaged, data was lost due to a virus attack, the drive was formatted, fdisk has been run, a power failure has caused a system crash, files were lost due to a software failure, files were accidentally deleted.
  • Ontrack EasyRecovery - Ontrack EasyRecovery software products offer home users or businesses complete solutions for their data recovery, file repair and disk diagnostic needs.
  • PhotoRec - PhotoRec is file data recovery software designed to recover lost files including video, documents and archives from hard disks, CD-ROMs, and lost pictures (thus the Photo Recovery name) from digital camera memory.
  • PhotoRescue - PhotoRescue is the best and fairest picture and data recovery solution for digital film - sd cards, compact flash, memory sticks, microdrive, etc.
  • Recovery is Possible - Recovery Is Possible (RIP) is a CD or USB boot/rescue/backup/maintenance system.
  • Restorer Ultimate - Restorer Ultimate offers data recovery software.
  • Safecopy - low level data recovery tool
  • SalvageData Recovery - SalvageData Recovery software tools and products are designed to empower both IT professionals and average personal computer users with all the functionalities and features needed to successfully salvage and recover data files from any kind of logical data loss situation.
  • SpinRite - SpinRite is a magnetic storage data recovery, repair, and maintenance utility.
  • TestDisk - TestDisk is powerful free data recovery software that was primarily designed to help recover lost partitions and/or make non-booting disks bootable again when these symptoms are caused by faulty software, certain types of viruses or human error (such as accidentally deleting a Partition Table).
  • Unrm - unrm is a small shell utility that can, under some circumstances, recover almost 99% of your erased data (similar to DOS's undelete).
  • Windows data recovery with ZAR - ZAR is Windows data recovery software.

Multi Format Rendering

Tools that support the rendering of a cross section of file format or content categories.

  • Quick View Plus - View virtually all the files and e-mail attachments you need, instantly without purchasing numerous software programs.

Quality Assurance

Tools that support quality checking of digital resources, identifying damaged, incomplete or low quality data. Typically used to identify damage introduced via processes such as format migration or digitisation.

  • AsTiffTagViewer - AsTiffTagViewer is a TIFF Tag Viewer application.
  • Bad Peggy - Scans for damaged images and photos.
  • Checkit tiff - a tool to validate TIFF files against given configuration profile
  • DV Analyzer - DV Analyzer is a technical quality control and reporting tool that examines DV streams in order to report errors in the tape-to-file transfer process.
  • File Analyzer and Metadata Harvester V2 - The File Analyzer is a general purpose desktop (and command line) tool designed to automate simple, file-based operations. The File Analyzer assembles a toolkit of tasks a user can perform. The tasks that have been written into the File Analyzer code base have been optimized for use by libraries, archives, and other cultural heritage institutions.
  • Fingerdet - QA tool for detecting fingers on digitised pages
  • GNU Diffutils - GNU Diffutils is a package of several programs related to finding differences between files.
  • ImageVerifier - ImageVerifier (IV for short) traverses a hierarchy of folders looking for image files to verify. It can verify TIFFs, JPEGs. PSDs, DNGs, and non-DNG raws (e.g., NEF, CR2).
  • Jp2StructCheck - Simple JP2 file structure checker
  • Jpylyzer - JP2 validation + properties extraction
  • KOST-Simy - The KOST-Simy application is used for Compare Images.
  • KOST-Val - KOST-Val is an open source validator for different file formats (TIFF, SIARD, PDF/A, JP2, JPEG) and Submission Information Package (SIP).
  • MP3val - MP3val is a small, high-speed, free software tool for checking MPEG audio files' integrity.
  • Matchbox Tool - Matchbox: Duplicate detection tool for digital document collections.
  • Mdqc - Tool for managing and comparing digital asset metadata
  • NARA Video Frame Analyzer - NARA Video Frame Analyzer analyzes technical properties of individual frames of a video file in order to detect quality issues within digitized video files.
  • Pagelyzer - Suite of tools for detecting changes in web pages and their rendering
  • Qctools - Analyse digital video and detect corruption/artefacts
  • ReACT (Resource Audit and Comparison Tool) - A file audit and comparison tool using Microsoft Excel and VBA.
  • SIARD-VAL - SIARD-Val is an open source validator for SIARD files.
  • SIARDexcerpt - SIARDexcerpt is a Java-based application that searches and extracts individual records of SIARD files.
  • SobekCM - SobekCM is a digital repository and digital scholarship/publishing system which enables easy deposit, preservation, and access for all types of digital content, tailored to the needs of galleries, libraries, archives, museums, scholars, and researchers.
  • TIFF-Val - TIFF-Val is an open source validator for TIFF files.
  • Web Application Testing with iMacros - iMacros makes it easy to test web-based applications.
  • XcorrSound - The xcorrSound package compares sound waves using cross correlation.

Redaction

Tools that support the removal of selected information from digital files. Typically used for removal of sensitive information like telephone or credit card numbers from personal archives before providing access to users.

  • MRU-Blaster - MRU-Blaster is a program made to do one large task - detect and clean MRU (most recently used) lists on your computer.
  • Microsoft Office 2003 Add-in: Word Redaction v1.2 - Use the Word 2003 Redaction Add-in to hide text within Microsoft Office Word 2003 documents.
  • Microsoft Office 2003/XP Add-in: Remove Hidden Data - With this add-in you can permanently remove hidden data and collaboration data, such as change tracking and comments, from Microsoft Word, Microsoft Excel, and Microsoft PowerPoint files.
  • RapidRedact - The RapidRedact product range provides fast, easy to use redaction tools for irreversibly blanking out (redacting) selected information, author's changes and hidden data from all electronic document types.
  • Redact-It - Provides Windows desktop and server redaction of PDF, Word, scanned TIFF images. Find, black out and remove content within documents, images or drawings.
  • Redax - Redax completely redacts (removes) text and graphics from the PDF page.

Rendering

Tools that support the rendering of digital resources so they can be viewed, printed, or otherwise accessed by users.

  • 7-Zip - 7-Zip is a file archiver with a high compression ratio
  • Calibre - An e-book management tool, including viewer, migration, and file conversion features among others.
  • DANS (Data Archiving and Networked Services) DBF - DANS DBF Library is a Java library for reading and writing xBase database files.
  • Filzip - Filzip offers full support (add and extract) support for ZIP (including Quake III's PK3), BH (BlakHole), CAB (Microsoft Cabinet), JAR (JavaARchive), LHA (LZH), TARand GZIP(TAR.
  • Gzip - gzip produces files with a .gz extension. gunzip can decompress files created by gzip, compress or pack
  • ImageMagick - ImageMagick® is a software suite to create, edit, compose, or convert bitmap images.
  • IrfanView - IrfanView is a very fast, small, compact and innovative FREEWARE (for non-commercial use) graphic viewer for Windows 9x, ME, NT, 2000, XP, 2003, 2008, Vista, Windows 7.
  • JJ2000 - Pure Java implementation of a JPEG2000 decoder
  • Kakadu - JPEG 2000 SDK, includes encoder/decoder
  • Mutlivalent - Multivalent works on digital documents research and development.
  • Open Office - OpenOffice.org 3 is the leading open-source office software suite for word processing, spreadsheets, presentations, graphics, databases and more.
  • Quick View Plus - View virtually all the files and e-mail attachments you need, instantly without purchasing numerous software programs.
  • Tar - The Tar program provides the ability to create tar archives, as well as various other kinds of manipulation.
  • VLC Media Player - Cross platform audio and video player based primarily on the libavcodec.
  • WinZip - WinZip is the world's most popular Windows Zip utility for file compression, file sharing, file encryption, and data backup.
  • Xpdf - Open source PDF viewer that includes PDF information extractor and font analyzer

Repair

Tools that support the repair of damaged or corrupted data.

  • Apache PDFBox - JAVA PDF library for creation, manipulation, validation and content extraction of PDF documents
  • Fixit tiff - fixes some issues in (potentially) baseline tiffs, as an example, invalid datetime tags, wrong tiff tag order
  • Pdftk - PDF manipulation tool

Preservation Planning

Functions that support the DCC Lifecycle stage defined as "Plan for preservation throughout the curation lifecycle of digital material. This would include plans for management and administration of all curation lifecycle actions."

Benefits

Tools that enable the identification and articulation of the benefits of preservation and curation.

Citation and Impact Tracking

Tools that support the citation of data and the tracking of the impact of usage of that data.

  • DataCite - DataCite works with data centres to assign persistent identifiers to datasets using the Digital Object Identifier (DOI) infrastructure.
  • ImpactStory - ImpactStory (previously Total-Impact) allows researchers and organisations to gather a wide range of impact metrics about multiple forms of scholarly output.
  • Mendeley - Mendeley is a combination web service and desktop application that allows users to create, manage, and share collections of references.
  • ReaderMeter - ReaderMeter is a web-based service that compiles readership information about scientific content to create an estimate of the content's community impact.

Content Profiling

Tools that build a profile of the characteristics of digital content, typically by combining or analysing a number of sources of information such as extracted metadata and file format identifications.

  • Brunnhilde - Siegfried-based characterization of directories and disk images
  • C3PO - C3PO is a content profiling tool for visualization and preservation analysis
  • DROID sqlite analysis - Analysis and automatic generation of summary information from DROID output
  • Web Archive Discovery - Indexing and discovery tools for web archives.
  • Yara - Pattern matching tool

Costing

Tools that support the calculation or prediction of the cost of preservation or curation activities.

Data Management Planning

Tools that support the development of research data management plans and related activities.

  • CARDIO - CARDIO is a benchmarking tool for data management strategy development
  • D-Net Software Kit - Software Kit creates a network of repositories that share the infrastructure services necessary to process and provide access to digital content.
  • DMAOnline (Data Management Administration Online) - Provides a single dashboard view of how various departments contribute to RDM activities and how an institution is performing in terms of its compliance with policies
  • DMPTool - DMPTool is an online service to enable researchers to create data management plans now required by many funding agencies, and to receive tailored institutional guidance to help them in the process.
  • DMPonline - DMPonline is the DCC's data management planning tool.

Organisational Audit

Tools that that enable an audit of an organisation's capability with respect to preservation, typically relating to a maturity model

  • CARDIO - CARDIO is a benchmarking tool for data management strategy development
  • DMAOnline (Data Management Administration Online) - Provides a single dashboard view of how various departments contribute to RDM activities and how an institution is performing in terms of its compliance with policies
  • DRAMBORA - DRAMBORA offers a quantifiable insight into the severity of risks faced by repositories right now, and an effective means for reporting these.
  • Data Asset Framework - The Data Asset Framework (formerly the Data Audit Framework) provides organisations with the means to identify, locate, describe and assess how they are managing their research data assets.
  • Embedding Repositories Self-Assessment Tool - Embedding Repositories Self-Assessment Tool is comprised of a series of questions designed to quantify the degree that a digital repository is ‘embedded’ within its institution – the extent to which both the organisation's research and its administrative culture recognise the repository’s value and take full advantage of its capacity.
  • NDSA Levels of Preservation - The "Levels of Digital Preservation" are a tiered set of recommendations for how organizations should begin to build or enhance their digital preservation activities.
  • OPD for RDM - An RDF based list of basic RDM infrastructure components to make this infrastructure more visible and easier to identify
  • RMCAS - RMCAS is an assessment tool for organisations wishing to map their current records management infrastructure against community best-practice.

Planning

Tools that support the planning of preservation activities.

  • AIDA - Assessing Institutional Digital Assets: Self-assessment tool for describing institutional readiness and capabilities for digital asset management and digital preservation
  • CARDIO - CARDIO is a benchmarking tool for data management strategy development
  • D-Net Software Kit - Software Kit creates a network of repositories that share the infrastructure services necessary to process and provide access to digital content.
  • DMAOnline (Data Management Administration Online) - Provides a single dashboard view of how various departments contribute to RDM activities and how an institution is performing in terms of its compliance with policies
  • DMPTool - DMPTool is an online service to enable researchers to create data management plans now required by many funding agencies, and to receive tailored institutional guidance to help them in the process.
  • DMPonline - DMPonline is the DCC's data management planning tool.
  • DRAMBORA - DRAMBORA offers a quantifiable insight into the severity of risks faced by repositories right now, and an effective means for reporting these.
  • Data Asset Framework - The Data Asset Framework (formerly the Data Audit Framework) provides organisations with the means to identify, locate, describe and assess how they are managing their research data assets.
  • Digital Preservation Capability Maturity Model (DPCMM) - Maturity / gap analysis model for digital preservation
  • Digital Preservation Management Tools and Techniques - A toolset for developing standards compliant digital preservation management documentation on an array of topics
  • Embedding Repositories Self-Assessment Tool - Embedding Repositories Self-Assessment Tool is comprised of a series of questions designed to quantify the degree that a digital repository is ‘embedded’ within its institution – the extent to which both the organisation's research and its administrative culture recognise the repository’s value and take full advantage of its capacity.
  • HoliRisk - HoliRisk is a framework and online tool to support the development of a risk assessment based on principles from ISO31000.
  • NDSA Levels of Preservation - The "Levels of Digital Preservation" are a tiered set of recommendations for how organizations should begin to build or enhance their digital preservation activities.
  • PLATO - Plato is a preservation-planning tool for organisations charged with safeguarding digital materials.
  • RMCAS - RMCAS is an assessment tool for organisations wishing to map their current records management infrastructure against community best-practice.
  • SCOUT - A brief description
  • Tufts Submission-Agreement Builder Tool - SABT is a web-based tool that guides records creators and records managers through the process of creating submission agreements, both for single transfers and for standing submissions.

Policy

Tools that support the development and management of digital preservation policy.

  • Catalogue of Policy Elements - Supports creation of new preservation policies as well as planning and watch activities.
  • Digital Preservation Management Tools and Techniques - A toolset for developing standards compliant digital preservation management documentation on an array of topics
  • HoliRisk - HoliRisk is a framework and online tool to support the development of a risk assessment based on principles from ISO31000.
  • Media conch - Media Conch is a implementation checker, policy checker and fixer for audiovisual files with focus on Matroska, LPCM and FFV1.
  • OpenDOAR - OpenDOAR is a simple, web-based tool that guides repository administrators through the process of creating basic policies for the submission, re-use, and preservation of digital materials.

Store

Functions that support the DCC Lifecycle stage defined as "Store the data in a secure manner adhering to relevant standards."

Active Data Storage

Tools that support the storage, management, and ultimately the preservation, of evolving research data.

  • DataStage - DataStage is a flexible data storage system that provides controlled access, secure backup, and the ability to transfer selected files to a more permanent archiving facility.
  • Dataverse - The Dataverse is an open source web application to share, preserve, cite, explore and analyze research data.

Backup

Tools that support the backing up of digital data to another storage location, typically in a scheduled manner.

  • Carbonite - an online backup service that automatically backs up documents, e-mails, music, photos, and settings. Info gathered early March 2013.
  • Chronopolis - "Chronopolis digital preservation network provides services for the long-term preservation and curation of America's digital holdings"
  • Data Vault - A storage broker and front end for archiving research data that is no longer active but that does not have a need for open publication
  • Dropbox - Dropbox is a free service that lets you bring all your photos, docs, and videos anywhere. This means that any file you save to your Dropbox will automatically save to all your computers, phones and even the Dropbox website. Dropbox also makes it super easy to share with others, whether you're a student or professional, parent or grandparent. Even if you accidentally spill a latte on your laptop, have no fear! You can relax knowing that Dropbox always has you covered, and none of your stuff will ever be lost.
  • Glacier (Amazon) - Amazon Glacier is a secure, durable, and extremely low-cost cloud storage service for data archiving and long-term backup.
  • SafeBack - SafeBack is used to create mirror-image (bit-stream) backup files of hard disks or to make a mirror-image copy of an entire hard disk drive or partition.

File Management

Tools that support general file management activities such as viewing or renaming

  • BAT: BnfArcTools - BAT is a Perl package for processing Internet Archive ARC, DAT and CDX file format.
  • Bulk Rename Utility - Bulk Rename Utility is a free file renaming software for Windows. Bulk Rename Utility allows you to easily rename files and entire folders based upon extremely flexible criteria.
  • Dcfldd - dcfldd is an enhanced version of GNU dd with features useful for forensics and security.
  • DiskView - DiskView shows you a graphical map of your disk, allowing you to determine where a file is located or, by clicking on a cluster, seeing which file occupies it.
  • Emailchemy - Converts proprietary emails to standard portable formats
  • Explore2fs - Explore2fs is a GUI explorer tool for accessing ext2 and ext3 filesystems.
  • File Analyzer and Metadata Harvester V2 - The File Analyzer is a general purpose desktop (and command line) tool designed to automate simple, file-based operations. The File Analyzer assembles a toolkit of tasks a user can perform. The tasks that have been written into the File Analyzer code base have been optimized for use by libraries, archives, and other cultural heritage institutions.
  • Fslint - Set of utilities to find and clean various forms of lint on a filesystem, such as duplicate files, empty directories, and bad file names.
  • Java library implementing Pairtree - The PAIRTREE LIBRARY is a software library that supports the mapping between identifiers and filepaths according to the Pairtree Specification.
  • ReACT (Resource Audit and Comparison Tool) - A file audit and comparison tool using Microsoft Excel and VBA.
  • ReNamer - ReNamer is a very powerful and flexible file renaming tool.
  • The Rename - bulk renaming of files - Bulk renaming of files - free downloadable software
  • TreeSize Professional - disk space management software - Manage disk space and scan your hard disks.

Fixity

Tools that support the verification of file fixity, typically through the generation and validation of checksum based manifests.

  • ACE (Audit Control Environment) - The Auditing Control Environment is a mature set of software designed to help libraries and archives prove their holdings are intact and trustworthy.
  • BIL (BagIt Library) - BagIt Library is a Java software library that supports the creation, manipulation and validation of bags.
  • BagIt Transfer Utilities - BagIt transfer Utilities are a collection of tools developed for the purpose of validation and transfer of bags.
  • Bagger - GUI application to facilitate the creation and verification of BagIt bags.
  • Cksum Unix command - cksum computes a cyclic redundancy check (CRC) checksum for each given file, or standard input if none are given
  • File Analyzer and Metadata Harvester V2 - The File Analyzer is a general purpose desktop (and command line) tool designed to automate simple, file-based operations. The File Analyzer assembles a toolkit of tasks a user can perform. The tasks that have been written into the File Analyzer code base have been optimized for use by libraries, archives, and other cultural heritage institutions.
  • FileVerifier++ - Windows utility for verifying file contents
  • Fixi - Fixi is a command-line utility that indexes, verifies, and updates checksum information for collections of files.
  • Fixity - Fixity monitoring for small-medium collections
  • Md5deep and hashdeep - md5deep is a set of programs to compute MD5, SHA-1, SHA-256, Tiger, or Whirlpool message digests on an arbitrary number of files. hashdeep is a program to compute, match, and audit hashsets.
  • Md5sum Unix command - md5sum computes a 128-bit checksum (or fingerprint or message-digest) for each specified file.
  • Md5summer - MD5summer is an application for Microsoft Windows 9x, NT, ME, 2000 and XP which generates and verifies md5 checksums.
  • NARA File Analyzer and Metadata Harvester - NARA File Analyzer and Metadata Harvester allows a user to analyze the contents of a file system or external drive and generates statistics about the contents of the contained directories.
  • Python checkm package - This is a Python implementation of the checkm specification.
  • Rhash - RHash (Recursive Hasher) is a console utility for computing and verifying hash sums of files.
  • SAFE Archive Audit System - Policy-based replication and Auditing of LOCKSS networks.
  • SSDeep - Recursive piecewise hashing tool

Managing Active Research Data

Tools that enable researchers to manage data from its point of creation, facilitating its productive use in the present, but also establishing the support structures necessary to ensure its future survival.

  • CRunch - cRunch provides an infrastructure for exploratory data analysis with the statistical programming language and environment R
  • D-Net Software Kit - Software Kit creates a network of repositories that share the infrastructure services necessary to process and provide access to digital content.
  • DMPTool - DMPTool is an online service to enable researchers to create data management plans now required by many funding agencies, and to receive tailored institutional guidance to help them in the process.
  • DMPonline - DMPonline is the DCC's data management planning tool.
  • DataCite - DataCite works with data centres to assign persistent identifiers to datasets using the Digital Object Identifier (DOI) infrastructure.
  • DataStage - DataStage is a flexible data storage system that provides controlled access, secure backup, and the ability to transfer selected files to a more permanent archiving facility.
  • Dataverse - The Dataverse is an open source web application to share, preserve, cite, explore and analyze research data.
  • Kepler - Kepler is a scientific workflow modelling and management system that enables users, regardless of programming experience, to set up data analysis pipelines.
  • LabTrove - LabTrove is a blogging platform specifically designed for use in a research environment.
  • MyExperiment - myExperiment is an online social networking service aimed at scientific researchers; the site fosters collaboration by allowing members to share scientific workflows, experiment plans, and other digital objects.
  • Taverna - Taverna is a scientific workflow management system designed to assemble, run, document and share sequences sequences of web services and scripts.
  • WebCite - WebCite is an on-demand web archiving service that takes snapshots of Internet-accessible digital objects at the behest of users, storing the data on their own servers and assigning unique identifiers to those instances of the material.

Persistent Identification

Tools that support the unique and persistent identification of files or intellectual entities.

  • DataCite - DataCite works with data centres to assign persistent identifiers to datasets using the Digital Object Identifier (DOI) infrastructure.
  • EZID - EZID (easy-eye-dee) makes it easy to create and manage unique, persistent identifiers.
  • WebCite - WebCite is an on-demand web archiving service that takes snapshots of Internet-accessible digital objects at the behest of users, storing the data on their own servers and assigning unique identifiers to those instances of the material.

Storage

Tools that support the storage of digital resources, possibly in multiple locations to avoid loss of data due to hardware or other failures.

  • Amazon Cloud - Amazon Cloud is an internet-based storage location designed to hold files indefinitely.
  • CERN Advanced STORage manager (CASTOR) - CASTOR, which stands for the CERN Advanced STORage manager, is a hierarchical storage management (HSM) system developed at CERN used to store physics production files and user files.
  • Carbonite - an online backup service that automatically backs up documents, e-mails, music, photos, and settings. Info gathered early March 2013.
  • Chronopolis - "Chronopolis digital preservation network provides services for the long-term preservation and curation of America's digital holdings"
  • DCape (ingest only) - "The goal of the DCAPE project is to build a distributed production preservation environment that meets the needs of archival repositories for trusted archival preservation services." (Note: This is a work in progress, see notes for more information)
  • Data Vault - A storage broker and front end for archiving research data that is no longer active but that does not have a need for open publication
  • Dataverse - The Dataverse is an open source web application to share, preserve, cite, explore and analyze research data.
  • Dropbox - Dropbox is a free service that lets you bring all your photos, docs, and videos anywhere. This means that any file you save to your Dropbox will automatically save to all your computers, phones and even the Dropbox website. Dropbox also makes it super easy to share with others, whether you're a student or professional, parent or grandparent. Even if you accidentally spill a latte on your laptop, have no fear! You can relax knowing that Dropbox always has you covered, and none of your stuff will ever be lost.
  • Glacier (Amazon) - Amazon Glacier is a secure, durable, and extremely low-cost cloud storage service for data archiving and long-term backup.
  • Google Cloud - Google Cloud Storage allows users to store, access, and manage their data.
  • Hoppla - Hoppla is an archiving solution that combines back-up and fully automated migration services for data collections in small office environments.
  • IRODS (integrated Rule Oriented Data Systems) - iRODS software was designed to allow curators utilising heterogeneous storage and computing facilities to define policies without being concerned with the technical detail of how the system implements those policies and without having to respond to changes in technical infrastructure.
  • LOCKSS (Lots of Copies Keep Stuff Safe) - LOCKSS software allows libraries to create preserved digital collections out of materials that would otherwise be accessible only through a licensed academic subscription.
  • Legacy Locker - Legacy Locker is a safe, secure repository for your vital digital property that lets you grant access to online assets for friends and loved ones in the event of loss, death, or disability.
  • RackSpace - RackSpace provices cloud based services to businesses of all sizes through the world.
  • The DICE Storage Resource Broker (SRB) - The DICE Storage Resource Broker (SRB) supports shared collections that can be distributed across multiple organizations and heterogeneous storage systems.
  • The aDORe Federation - The aDORe Federation is a federated repository framework and reference implementation which aims to address many of the scalability issues experienced by large scale digital object repositories.