Skip to Main Content

Research Data Management

A comprehensive guide to the best practices for planning, collecting, working with, sharing and reusing research data

Storage, Archival & Backup


A proper data storage and backup strategy is the key to prevent data loss and to ensure long-term availability of data for future reuse.

File formats – how to choose file formats to ensure access over time?

File formats can become obsolete or unusable over time. Hence, it will be necessary to choose an appropriate format for long-term access to the data. To ensure the long-term sustainability and accessibility, an ideal file format should be:

  • widely used to maximize reusability
  • open source with publicly available specifications
  • non-proprietary and compatible with multiple software

Source: AILLA Archive of the Indigenous Languages of Latin America. (2018, June 11). Sustainable File Types [Video]. YouTube. https://youtu.be/2JCpg6ICr8M

To preserve your valuable research data for long-term use, you are recommended to save your data in the following formats suggested by UK Data Service & DANS:


Type of data

Recommended formats

Acceptable formats


Textual data view_headline

  • Rich Text Format (.rtf)
  • Plain text, ASCII (.txt)
  • Hypertext Mark-up Language (.html)
  • Widely-used formats: MS Word (.doc/.docx)

Document data description

  • Rich Text Format (.rtf)
  • PDF/UA, PDF/A or PDF (.pdf)
  • XHTML or HTML (.xhtml, .htm)
  • OpenDocument Text (.odt)
  • Plain text (.txt)
  • Widely-used formats: MS Word (.doc/.docx), MS Excel (.xls/.xlsx)
  • XML marked-up text (.xml) according to an appropriate DTD or schema, e.g. XHMTL 1.0

Image data image

  • TIFF 6.0 uncompressed (.tif)
  • JPEG (.jpeg, .jpg, .jp2) if original created in this format
  • GIF (.gif)
  • TIFF other versions (.tif, .tiff)
  • RAW image format (.raw)
  • Photoshop files (.psd)
  • BMP (.bmp)
  • PNG (.png).Adobe Portable Document Format (PDF/A, PDF) (.pdf)

Audio data headphones

  • Free Lossless Audio Codec (FLAC) (.flac)
  • MPEG-1 Audio Layer 3 (.mp3) if original created in this format
  • Audio Interchange File Format (.aif)
  • Waveform Audio Format (.wav)

Video data videocam

  • MPEG-4 (.mp4)
  • OGG video (.ogv, .ogg)
  • motion JPEG 2000 (.mj2)
  • AVCHD video (.avchd)

Spreadsheets border_all

  • ODS (.ods)
  • CSV (.csv)
  • Microsoft Excel (.xls)
  • Office Open XML Workbook (.xlsx)
  • PDF/A (.pdf)

Statistical data equalizer

  • SPSS (.dat/.sps)
  • STATA (.dat/.DO)
  • R
  • SPSS Portable (.por)
  • SPSS (.sav)
  • STATA (.dta)
  • SAS (.7dat; .sd2; .tpt)

During the course of a research project, you will need a temporary storage for working files. You will also need a long-term storage for archiving data with high potential for reuse after the completion of a research project. Below table shows a range of suitable storage options for active data storage and for long-term preservation - Keep in mind that you should use at least two different storage options to avoid accidental data loss.


Storage Medium

Working data

Archival

Advantages

Limitations


Portable Storage

  • USB drives
  • External hard drives
  • Memory cards

For transporting data only

N

  • Portable
  • Encryption for sensitive data
  • Short lifespan
  • Most are unsecured
  • Easily lost, misplaced or stolen

Local Storage

  • Desktop computers
  • Laptops

Y

Y

  • Convenient
  • Allow syncing to a cloud storage
  • Risks of data corruption
  • Risks of loss, theft, or failure

Lingnan Networked Disk Storage

  • Network Home Folder (H:)
  • Common share drive for departments (S:)

Y

Depend on data size


Cloud Storage

  • OneDrive (Lingnan users are provided with 5TB storage)

Y

Y

  • Suitable for collaboration with external research partners
  • Enable co-authoring on files
  • Accessible from any internet-connected devices
  • Regular backup
  • Not suitable for sensitive or confidential data

Data Repository Storage

N

Y

  • Suitable for sharing data publicly
  • Comply with funders or publishers requirements
  • Only suitable for preserving data after the end of a research project
  • Some are available to subscribers only

Having a sound backup strategy helps you to safeguard your valuable research data from potential threats such as software or hardware failure, and human errors, and virus infection, etc. Here are the best practices for building an effective backup strategy:


3-2-1 Rule

You are recommended to follow the 3-2-1 Rule, which suggests you to:

  1. keep 3 copies of your data
  1. on 2 different storage medium
  1. with 1 copy offsite

Regular Backups & Tests

You are recommended to schedule backup at regular intervals, such as daily or weekly. When scheduling backups, you should consider:

  • how often you make changes to your data?
  • how much data are you prepared to lose?

Apart from regular backups, you should also test your backups periodically to ensure you can actually restore your data in the event of an incident. You can use checksum tools like MD5Summer to verify the integrity of data files being backed up, to ensure no corruption has occurred during the backup process.

Learning Resources

Knowledge clip: Data Security

In this knowledge clip, we have a look at data security. What are the core information security principles? What can put your data at risk and which measures can you take to mitigate these risks?

Source: UGent Data Stewards. (2020, October 15). Knowledge clip: Data Security [Video file]. YouTube. https://youtu.be/8JyZ9F_zmPw