For your data to be considered truly anonymous your research subjects should not be identifiable by any reasonable means when your data are published. |
If data deposit is appropriate AND you have data which must be de-identified:
Whether any of your non-anonymized private and/or sensitive data needs to be retained past the active research phase - and why ? Include your rationale for this in your data management plan, participant release forms and funding application(s).
Depending on the nature of your project it may not be possible to share your research data with others. If you do plan to share your data - and your research involves human subjects - this needs to be clearly articulated in your participant consent forms along with all the means you will take to protect participant privacy, including your anonymization techniques. Additionally, you should include your anonymization measures in your data management plan as well as the documentation and README file that will accompany your eventual data deposit. "For transparency, it should be clear how the dataset was modified to protect study participants" (Portage COVID-19 Working Group. De-identification Guidance). |
Listed below are several of the most common strategies to mask or remove the personally identifying information from your data. You'll need to carefully consider if / which one(s) will work best for your situation.
Files should not be destroyed thoughtlessly...(as) it is often useful to keep ‘working’ files safe in order to backtrack in the research process.... (However), at the conclusion of the research, data files that are not required for preservation need to be disposed of securely (UK Data Service. Disposal). |
NOTE: Data stored in the cloud on non-Enterprise services may prove challenging to delete permanently. While there may be a delete function built into the application it's not always clear if the data truly are deleted from the application's servers.
Finnish Social Science Data Archive. Anonymisation and Personal Data. Provides a comprehensive look at a range of anonymization techniques - for both qualitative and quantitative data.
Irish Qualitative Data Archive (IQDA). Guide to Anonymising Qualitative Data: looks at five specific practices to follow to achieve "an appropriate level of anonymisation that will not lessen the use-value of the data."
Portage COVID-19 Working Group. De-identification Guidance: Prepared on behalf of the Canadian Association of Research Libraries (CARL), this guide aims to "help Canadian researchers minimize disclosure risk when sharing data collected from human participants" by providing guidance on removing direct identifiers, evaluating risk associated with indirect identifiers, strategies for qualitative data de-identification and more.
ARX: Data Anonymization Tool. Developed with support from several universities in Germany, ARX is "open source software for anonymizing sensitive personal data. It supports a wide variety of (1) privacy and risk models, (2) methods for transforming data and (3) methods for analyzing the usefulness of output data."
OpenAire: Amnesia. From the European Union's Open Science initiative, Amnesia is a "de-identification method that removes or replaces direct identifiers (names, ids, phone numbers, etc.) from a sensitive dataset."
UK Data Service: provides access to its free text anonymisation tool as a zipped download.