Data Documentation Best Practices: [from McGill University, 2:37]
Accompanying Documentation
What Else is Needed?
In order to be re-usable / replicable, data deposits should be accompanied by any additional documentation or information that a 3rd party would need to be able to open, understand, and/or analyze it.
For example:
Codebooks, aka, data dictionaries: these contain all codes, acronyms, specialized terms etc., that were used to label your data variable fields.
Avoid providing these in word table format which is less convenient for re-use than spreadsheets.
and/or the the guide to codebooks from the Data Documentation Initiative (DDI) for more detailed advice.
a data sources guide: makes clear how the data were derived, by whom and when - and any other information needed for a third-party to make sense of your data, including:
explanation of survey instruments used
explanation of research methods
copy of questionnaire(s) used
if you aren't providing a code book then include definitions for any codes, abbreviations, specialized terms in the data sources guide.
list of software needed to open/analyze the data and the program/scripts used to process and clean the data
any copyright, privacy, or other limits to accessing / re-using your data
explanation of the role each data contributor played if relevant, e.g., did different team members derive their data in different geographic regions, use different survey instruments, focus on different research subjects?
version history logs, which include a brief recap of each revision to your original data, including who made the changes and why
Project Tier has an extremely useful guide to best practices in project folder and file structure, including when and where to include ReadMe files, codebooks/data dictionaries, script processes documentation, and more.
Social Science Data Editors have released a very helpful template for ReadMe files for social science datasets