Research Guides: Research Data Management: Managing & Naming Files

Managing your files

Research data are too valuable to risk them becoming inaccessible or unusable due to avoidable issues such as software obsolescence or failing to include information about the codes and metadata used to describe your data as well as the scripts and programs used to process/clean your data.

Key considerations:

Use open file formats rather than formats from proprietary software whenever possible.
- Examples include: .csv for spreadsheets; .gif, .png, .jpeg for images; .txt, .html, .pdf for textual data.
- See "Using open file formats helps future-proof your data" from UBC Library for more information.
Provide a data dictionary or codebook so that external users - including those outside of your discipline - can make sense of your data, e.g., define any codes, terminology, or abbreviations used. For more information about data documentation see our guide to Metadata for Researchers
Include all the scripts and/or source codes that you used to analyze/process your data. and any other documentation an external user would need in order to replicate/understand your processes.
- Project Tier has an extremely useful guide to best practices in project folder and file structure, including when and where to include ReadMe files, codebooks/data dictionaries, script processes documentation, and more.
- Social Science Data Editors have released a very helpful template for ReadMe files for social science datasets.

If it's not possible to use an open file format and/or software program carefully consider your options for moving your data onto a new program if the original becomes obsolete/inaccessible.

Having a strategy for shifting your work onto new platforms will ensure that you are able to fulfill your obligation to archive/preserve your data in a usable form for the long term.
Record your strategy in your DMP for your future convenience.

To learn more see:

The Research Data Management Workbook - Briney, K. Caltech Libraries [Workbook]
- a collection of exercises for researchers to improve their data management. The Workbook contains exercises across the data lifecycle, including writing project-level READMe files & data dictionaries, setting up file organization/naming systems, choosing storage and backup solutions and much more.
the UK Data Archive's recommended formats for data sharing, reuse and preservation
UBC Library Research Data Management:
- Document & Describe
- File Formats

File Naming

One of the simplest ways to ensure that your data will be findable, accessible and reusable is to adopt a consistent file and folder naming system. Include your strategy along with your datasets when you archive / make your data publicly available.

Best Practices

Document your folder names & structure in a ReadMe file. Outline the content and organization of each folder, who the file authors/contributors are, any changes you make as you make them along with anything else a 3rd party needs to know to understand what each version of your data represents.

Make a copy of your original/raw dataset and save it to a read-only, password protected file to prevent anyone from accidentally overwriting it, deleting it etc. - and to provide the clearest possible version history for your data as you work through all the data processing and analysis stages.

Note that files in the Microsoft Cloud platform are auto-saved every few seconds. For this reason, it’s best to click the editing drop-down menu and then “Open in Desktop App” before beginning any work on your files.

Depending on the version you have, the desktop versions of Word, Excel and PowerPoint either do not save your edits until you click save or save as OR have a clearly labelled "Auto-save - ON/OFF" button in the upper left of the application – making it much easier to

Save and number your data revisions as new files - aka versions - every time you make an edit instead of continually updating the original file, e.g., 20220207_MyprojectData_v1; 20220315_MyprojectData_v2 etc.
Keep a version history table or spreadsheet that briefly notes who was responsible for creating each revision and the purpose/gist of the change(s). Include this with your datasets when you publish / deposit them.
This provides an audit trail to explain/justify all the changes you make to your data over time.

In the event of a tech failure or some other problem with a revision you'll have previous versions to build from rather than having to re-generate your original data - if that's even possible.
- Be aware that DC's Sharepoint system - including its manifestations OneDrive & Teams - only retains its automatic file back ups for 30 days, so unless you make your own back ups as described above you will not be able to access any older file versions.

Be consistent with your file name elements, such as:

date format: choose a single date format such as ISO 8601 - YYYYMMDD - and stick to it faithfully.
order of file name elements: choose a standard order and stick to it. For example, you might decide that file names always begin with Date - then DataType - then name of the Researcher who generated the data - and finally the Version Number, e.g., 20220326_TS_Lee_V7
If any abbreviations or partial names are used - ensure your ReadMe file explains them, e.g., TS = transcript | Lee = Dr. Susannah Lee etc.

Don't use spaces or special characters in your file names, e.g., & ? # . which could interfere with / be misunderstood by computer operating systems - leading to errors.

Instead use a dash - underscore _ or CamelCase to separate your file name elements.
Limit your file names to no more than 32 characters if possible.

Periods should not be part of your file name other than where added automatically by your software application to indicate the file format extension, e.g., .jpg, .csv etc.

To learn more see:

The Research Data Management Workbook - Briney, K. Caltech Libraries [Workbook]
- a collection of exercises for researchers to improve their data management. The Workbook contains exercises across the data lifecycle, including writing project-level READMe files & data dictionaries, setting up file organization/naming systems, choosing storage and backup solutions and much more.
Open Science Framework:
- File Naming
- Organizing Files
UBC Research Data Management: Organize (your data)
Harvard Research Data Management: Version Control
UK National Archives: Best Practices for File Naming

Research Data Management

File Names & Structure [5:42]

Data Management Errors to Avoid (2:29)

Managing your files

Key considerations:

To learn more see:

File Naming

Best Practices

To learn more see: