Skip to Main Content
Douglas College Library About Us Articles & Databases Research Guides Services Faculty News Events Learning Centre

Safeguarding Research Data

Planning

For your data to be considered truly anonymous your research subjects should not be identifiable by any reasonable means when your data are published.

 

Start by:

 


Next: 
 

If data deposit is appropriate AND you have data which must be de-identified:
 

  • Review your ethics approval and participant consent forms to ensure that your intended processes honour the commitments you've made with respect to the deposit, sharing/access restrictions, long-term retention, and/or future destruction of your research data.
     
  • Establish your timeline and the method(s) that will be used to permanently delete personally identifying information.
     
  • Determine who will be responsible for carrying out this work. 
     
  • Decide who will be in charge of verifying that the data de-identification strategies chosen are suitable and have been applied correctly.


Ask yourself:

 

Whether any of your non-anonymized private and/or sensitive data needs to be retained past the active research phase - and why ?  Include your rationale for this in your data management plan, participant release forms and funding application(s).
 

  • What's your plan for keeping this information secure for the long term? 
     
  • For data that should be destroyed, how will you ensure that your deletion technique is permanent and reliable?

Masking or Removing PII

Depending on the nature of your project it may not be possible to share your research data with others.  If you do plan to share your data - and your research involves human subjects -  this needs to be clearly articulated in your participant consent forms along with all the means you will take to protect participant privacy, including your anonymization techniques.

Additionally, you should include your anonymization measures in your data management plan as well as the documentation and README file that will accompany your eventual data deposit.  "For transparency, it should be clear how the dataset was modified to protect study participants" (Portage COVID-19 Working Group. De-identification Guidance).

 

Common Strategies

 

Listed below are several of the most common strategies to mask or remove the personally identifying information from your data.   You'll need to carefully consider if / which one(s) will work best for your situation.

 

 

  • Anonymization occurs when all personally identifying information is permanently stripped from your datasets - including pseudonyms/coded references.  This must be done meticulously to ensure that nothing is accidentally overlooked or forgotten, e.g., in sub-folders, previous versions etc., and that deleted materials are truly unretrievable.  See the Deletion Options box below for more information.
     
    • Advantage: once anonymized individuals can not be re-associated with the data "with reasonable effort based on the data provided or by combining the data with additional data points" (Finnish Social Science Data Archive.  Anonymisation and Personal Data: Terms to Understand).
       
    • Disadvantage: once anonymized it is not possible to link individuals to their data across multiple datasets, which may limit their usefulness for other researchers.


 

  • Pseudonymization: is when personal identifiers are replaced with pseudonyms or some other type of of artificial identifier, such as a numeric code.
     
    •  Advantage: allows researchers to link pseudonymous data to the same individual across multiple datasets without exposing their identity.
       
    • Disadvantage: this technique is usually managed by having an identification key or master stored in a different location, which could be hacked or exposed through human error. 
       
      • allowing data to be linked to an individual across multiple datasets also increases the risk that subjects could be re-identified, particularly if multiple variables pertaining to the individual are available, e.g., profession + location + gender may in some circumstances be enough to identify a research participant.
         
      • To mitigate this risk the researcher could use different pseudonyms for different datasets, though this "may remove some analytical value" (UBC Library.  Pseudonymization).

         
  • Aggregation: is when data from multiple individuals are combined together and displayed in groupings, such as income or age ranges, geographic regions etc. 
     
    • Advantage: Reduces the risk of re-identification of individual research participants as the data in each aggregated category are derived from multiple respondents, not specific individuals.
       
    • Disadvantage: Depending on how the data are aggregated and the degree to which they can by filtered by other variables, aggregated data may limit the type of analysis that external researchers can perform.
       
      • Additionally, the risk of re-identification is not zero.  Similar to the risk posed by data pseudonymization techniques, even aggregate data may inadvertently reveal individual identities if they can be filtered by multiple variables
         
      • e.g., Can a user find out how many 18 - 35 year old men from your study: have a Phd AND work in car sales AND reside in the Chilcotin region of BC? 

         
  • Data Swapping, aka permutation: is when a data variable from one research subject is swapped with another.
     
    • Advantage: the distribution and dispersion of the swapped variable remain unchanged and the risk of associating individuals with their data is greatly decreased.
       
    • Disadvantage: "correlations between the variable and the values of other variables for the particular individual are lost" (Finnish Social Science Data Archive. Anonymisation and Personal Data).

Deletion Options

Files should not be destroyed thoughtlessly...(as) it is often useful to keep ‘working’ files safe in order to backtrack in the research process....  (However), at the conclusion of the research, data files that are not required for preservation need to be disposed of securely (UK Data Service. Disposal)

 

Permanent deletion techniques to consider:
 

  • For locally saved files, e.g., on your College-issued laptop, desktop and/or mobile device: contact the IT service desk for assistance with permanently deleting the relevant files.
     
    • NOTE: "deleting" a file by using your computer's delete button or dragging a file into your computer trash does NOT remove the file from your computer and should not be the basis of your secure deletion strategy.
       
    • Many software deletion programs are designed for hard-drives, not removable flash-drives.  It may be that your best option for a flash drive is to physically destroy it.
       
  • demagnetize magnetic tapes
     
  • physically destroy the storage media (e.g., burning, shredding, crushing)
     
  • personal mobile devices should be wiped in accordance with the manufacturer's instructions, e.g.,  Samsung: https://www.samsung.com/latin_en/support/mobile-devices/how-do-i-delete-all-of-my-personal-information-from-my-device/
     
    • Note, wiping your device does not necessarily delete information stored on external memory such as SD cards.  These will need to be wiped / destroyed independently of your other file-deletion actions.

 

Non-Enterprise Services

 

NOTE: Data stored in the cloud on non-Enterprise services may prove challenging to delete permanently.  While there may be a delete function built into the application it's not always clear if the data truly are deleted from the application's servers. 
 

  • Data which were stored on Douglas College cloud services - SharePoint, OneDrive, Teams - are encrypted and are not accessible to anyone without the appropriate access/credentials.  Once deleted they become permanently inaccessible to anyone after 30 days.

Guidance & Tools

Anonymization Guidance

 

Finnish Social Science Data Archive. Anonymisation and Personal Data.  Provides a comprehensive look at a range of anonymization techniques - for both qualitative and quantitative data.

 

Irish Qualitative Data Archive (IQDA). Guide to Anonymising Qualitative Data: looks at five specific practices to follow to achieve "an appropriate level of anonymisation that will not lessen the use-value of the data."

 

Portage COVID-19 Working Group De-identification Guidance: Prepared on behalf of the Canadian Association of Research Libraries (CARL), this guide aims to "help Canadian researchers minimize disclosure risk when sharing data collected from human participants" by providing guidance on removing direct identifiers, evaluating risk associated with indirect identifiers, strategies for qualitative data de-identification and more.

 

Helpful Tools

 

ARX: Data Anonymization ToolDeveloped with support from several universities in Germany, ARX is "open source software for anonymizing sensitive personal data. It supports a wide variety of (1) privacy and risk models, (2) methods for transforming data and (3) methods for analyzing the usefulness of output data."

 

OpenAire: Amnesia.  From the European Union's Open Science initiative, Amnesia is a "de-identification method that removes or replaces direct identifiers (names, ids, phone numbers, etc.) from a sensitive dataset."

 

UK Data Service: provides access to its free text anonymisation tool as a zipped download.