Skip to Main Content
Douglas College Library About Us Articles & Databases Research Guides Services Faculty News Events Learning Centre

Research Data Management

Managing your files

Research data are too valuable to risk them becoming inaccessible or unusable due to avoidable issues such as software obsolescence or failing to include information about the codes and metadata used to describe your data as well as the scripts and programs used to process/clean your data. 

 

Key considerations:


 

  • Use open file formats rather than formats from proprietary software whenever possible. 

     
  • Provide a data dictionary or codebook so that external users - including those outside of your discipline - can make sense of your data, e.g.,  define any codes, terminology, or abbreviations used.  For more information about data documentation see our guide to Metadata for Researchers

     
  • Include all the scripts and/or source codes that you used to analyze/process your data. and any other documentation an external user would need in order to replicate/understand your processes.

     

 

If it's not possible to use an open file format and/or software program carefully consider your options for moving your data onto a new program if the original becomes obsolete/inaccessible. 

 

  • Having a strategy for shifting your work onto new platforms will ensure that you are able to fulfill your obligation to archive/preserve your data in a usable form for the long term.

     
  • Record your strategy in your DMP for your future convenience.

 

To learn more see:
 

File Naming

  • One of the simplest ways to ensure that your data will be findable, accessible and reusable is to adopt a consistent file and folder naming system.  Include your strategy along with your datasets when you archive / make your data publicly available.

 

Best Practices


 

Document your folder names & structure in a ReadMe file.  Outline the content and organization of each folder, who the file authors/contributors are, any changes you make as you make them along with anything else a 3rd party needs to know to understand what each version of your data represents.

 

Make a copy of your original/raw dataset and save it to a read-only, password protected file to prevent anyone from accidentally overwriting it, deleting it etc. - and to provide the clearest possible version history for your data as you work through all the data processing and analysis stages.
 

 

Note that files in the Microsoft Cloud platform are auto-saved every few seconds. For this reason, it’s best to click the editing drop-down menu and then “Open in Desktop App” before beginning any work on your files.
 

Depending on the version you have, the desktop versions of Word, Excel and PowerPoint either do not save your edits until you click save or save as OR have a clearly labelled "Auto-save - ON/OFF" button in the upper left of the application – making it much easier to
 

 

  • Save and number your data revisions as new files - aka versions every time you make an edit instead of continually updating the original file, e.g., 20220207_MyprojectData_v1;  20220315_MyprojectData_v2 etc.
     
  • Keep a version history table or spreadsheet that briefly notes who was responsible for creating each revision and the purpose/gist of the change(s).  Include this with your datasets when you publish / deposit them.
     
  • This provides an audit trail to explain/justify all the changes you make to your data over time.
     
  • In the event of a tech failure or some other problem with a revision you'll have previous versions to build from rather than having to re-generate your original data - if that's even possible. 

     
    • Be aware that DC's Sharepoint system - including its manifestations OneDrive & Teams - only retains its automatic file back ups for 30 days, so unless you make your own back ups as described above you will not be able to access any older file versions.

       

Be consistent with your file name elements, such as:


 

  • date format: choose a single date format such as ISO 8601 - YYYYMMDD - and stick to it faithfully.
     
  • order of file name elements: choose a standard order and stick to it.  For example, you might decide that file names always begin with Date - then DataType - then name of the Researcher who generated the data  - and finally the Version Number, e.g., 20220326_TS_Lee_V7

     
  • If any abbreviations or partial names are used - ensure your ReadMe file explains them, e.g., TS = transcript | Lee = Dr. Susannah Lee etc. 

 

Don't use spaces or special characters in your file names, e.g., & ? # . which could interfere with / be misunderstood by computer operating systems - leading to errors.

 

  • Instead use a dash -  underscore _  or CamelCase to separate your file name elements. 
     
  • Limit your file names to no more than 32 characters if possible.
     

Periods should not be part of your file name other than where added automatically by your software application to indicate the file format extension, e.g., .jpg, .csv etc. 
 

To learn more see: