HOW WE ARCHIVE

METHODS & TOOLS

Methodology

The methodology used by the Rohingya Genocide Archive to collect and archive content was developed collaboratively between project partners starting in early 2018. We developed the policies and procedures for each step of the methodology during intensive workshops that included conversations, training, and collaborative policy writing.

Here is a basic overview of RGA’s collection methodology and the tools, including some “archiving tips” aimed at groups who may be considering starting their own archival initiatives.

Sources and Selection
Downloading and Collecting
Packaging and Preparing for Storage
Cataloging
Storage and Backup
Access

SOURCE AND SELECTION

RGA determined the main types of sources it would collect from (e.g. primary sources, Rohingya diaspora sharing on social media, etc.) to meet its goals. RGA then created criteria for assessing the credibility of these sources within each category (e.g. whether they were vetted by RVision, the source’s history of activities, etc). These criteria were used to generate lists of credible sources that RGA prioritized and collected from. In line with its goals, RGA developed a selection policy that describes the scope of the collection based on content types, subject matter, geographic and temporal ranges, and other specific criteria. 

Archiving tip: It is essential to create agreed-upon selection criteria / policies based on the purpose and goals of the archive. Write them down. Written policies provide ongoing reference helps prioritize work, manage limited resources, and ensure that the collection serves the goals of the archive. 

DOWNLOADING AND COLLECTING

RGA follows different technical procedures for downloading and collecting content depending on the platform or storage device it originates from (e.g. Twitter, Facebook, hard drive, etc). For each platform or device, the procedure involves collecting the content in the highest available quality (original if possible) as well as metadata that identifies the files’ provenance.

Archiving tip: To ensure transparency, RGA uses open-source tools or tools provided by source content platforms whenever possible, including youtube-dl, twurl, WhatsApp chat export, and platform APIs. One challenge is that technology is always changing and tools can break or become unavailable over time, so procedures need to be periodically updated.

PACKAGING AND PREPARING FOR STORAGE

RGA prepares content for storage by “wrapping” it into packages. Essentially, each package is a uniquely named folder (following the RGA naming scheme) that contains the collected content (e.g. the video and its metadata), arranged within the folder in a standardized way, which includes a manifest or “packing list” of files and their MD5 hashes. 

Archiving tip: RGA uses the open-source tool Exactly to create BagIt packages. Packaging helps to ensure that content can be easily identified and validated for authenticity and integrity, and facilitates good organization of content in storage.

CATALOGING

RGA catalogs the collected content in a catalog to enable identification, analysis, search, and retrieval of items in the collection. The catalog provides provenance, reference, and basic descriptive information that is meant to be granular and unambiguous, and deliberately avoids extensive analysis or interpretation.  

Archiving tip: To ensure consistent and accurate cataloging, RGA developed and uses a data dictionary as a reference. The data dictionary provides a detailed definition of each column in the spreadsheet and provides guidance and rules for how data should be entered. 

STORAGE AND BACKUP

RGA stores and backs up the collection in a physically secure location and additionally maintains an offsite cloud backup. Access to the stored items and to the catalog is limited to project participants, unless separate copies are provided for access (see below).

Archiving tip: Local backup should be on a device separate from the primary storage to mitigate loss in case of hardware failure. Offsite backup, whether it is on another device or a cloud storage service, is also important to mitigate loss in case of on-premise incidents.

ACCESS

To date, RGA has provided access to portions of its catalog and collection to selected stakeholders on a one-on-one basis for the purposes of pursuing justice and accountability. Rather than providing direct access to stored contents, RGA produces copies and delivers them to users via encrypted transfer. 

Archiving tip: Tresorit is a service that RGA has used to securely transfer content to an intended user. It is a commercial service based in Switzerland that provides zero-knowledge end-to-end encrypted file sharing.