Guides @ LU: Research Data Management: Data Organization

Data Organization

A consistent and logical data organization is the key to ensure efficient access to your data throughout and beyond the entire research process. Hence, it will be crucial to get your data organized in the following aspects:

File structure – where to put data files so you can locate them?

A well-organized file structure enables you to easily locate the files that you need at a glance. Here are some useful tips to structure your data folders to make it easier to locate and organize files and versions:

When planning for a good file structure, you should:

create a unique folder for the project - which will be the parent folder containing all files in the project
determine the scale of the project - how complex the file structure should be?
identify the parameters distinguishing data - someone looking at your files should be able to recognize those parameters when looking at both your file structure and filenames
assess the easiest way to access the data - which will be often set by the most important distinguishing variable between datasets
keep consistency within the project

A hierarchical folder structure is the the organization of files into a "tree" structure hierarchically to divide files into categories, such as by subject, time, location, and format, etc. Here are some tips to build a good hierarchical folder structure:

The top-level folder should include the project information, such as the project title & ID and date
Folder and subfolder names reflect the content of the folder
- Avoid overlapping categories
Separate different versions of data files, i.e. raw data, edited data and finalized data, into separate folders to avoid accidental misuse
Consider the balance between breadth and depth
- Too many layers or files within each folder will make it more difficult to find your files
- Keep hierarchy to less than 4 levels and files in each folder to fewer than 10 items

It is important to have a well-described documentation of your file directory structure, especially when you are collaborating with others. You are recommended to document your file directory structure e.g. in a readme.txt file, to describe and determine where the files should be stored so as to enable you and your collaborators to keep consistency in record keeping over time. Please see here for more about data documentation.

Watch this video to learn about how to prepare a useful readme file:

Source: Harvard Library. (2020, September 3). Online Short-Seminar: What's in a README? [Video]. YouTube. https://youtu.be/RR1nhKhbabI

Not all data files can be accommodated within a hierarchical structure. In such cases, you may choose to add some tags to the data files to make them searchable by tags. Read the LibGuide by MIT Libraries to learn about how to tag your files.

You can also read the guide by MIT Communication Lab to learn more about the best practices of creating successful file structures.
Source: Diana, C. (2013). File Structure. Available online: https://mitcommlab.mit.edu/broad/commkit/file-structure/

File naming – what to name data files so you can identify them?

A good file name should be short but descriptive which gives useful hints on the content, status and version of a file. Here are the rules for naming files efficiently:

Keep the file names short but descriptive
- less than 25 characters
- file names should reflect the content, status and version of the files
Order the elements in a filename from general to specific
Avoid using spaces and special characters
- they may confuse some operating systems and programs
- use hyphens (-) or underscores (_) to separate elements if needed
Present date in a standard format (YYYYMMDD)
- read more about the International Date and Time Data Standard
Include versioning within file names where appropriate
- Be descriptive and avoid ambiguity, e.g. use “-v02“, not “_final_final“ when versioning
Prepare a readme file about the naming convention, including any abbreviations or initials used in file names
Keep consistency within the project

Image source: Archiving for the Future. (n.d.). Filenaming tips. Available at: https://archivingforthefuture.teachable.com/courses/test-page/lectures/22161223

File versioning – how to track changes to data files?

It is always a good practice to keep multiple versions of your data files to reduce risks of corruption and data loss as well as to enable you to easily revert to previous versions of files. To implement an effective manual version control, you will need to:

Save raw data and milestone versions as read-only files, preferably in separate folders
Always remember to keep at least one copy of the original data
Use a systematic, consistent naming convention to name different versions of files
- Include version numbering or date in the file names, e.g. "Filename_v01", "Filename_20220201"
- Use at least two digits with a leading zero for your version numbering to enable correct sorting
- Extend the file name for minor changes when needed, e.g. "Filename_v01-01", "Filename_v01.01"
- Avoid using ambiguity such as "Filename_final_final"
- For collaborative project, also include author initials to indicate who made changes to the file
Create an archive folder to separate old versions with master versions
- It is crucial to keep your old versions in a separate folder to avoid mistakenly editing the outdated version

Learning Resources

Knowledge clip: Keeping research data organized

Keeping files organized during your research is a key aspect of data management. In this knowledge clip, we have a look at the different aspects of file organization (file naming, folder structure and version control), and provide tips and best practices.

Source: UGent Data Stewards. (2021, September 29). Knowledge clip: Keeping research data organized [Video]. YouTube. https://youtu.be/YslfY4W-NAg

Research Data Management

contact_support Have Question?