Skip to Main Content
Douglas College Library About Us Articles & Databases Research Guides Services Faculty News Events Learning Centre

Secondary Research: Statistics and Data

Before using statistics/data

Assess before you use
 

If you haven't assessed the quality, accuracy and relevance of a statistic or dataset - how can you responsibly use or share it? 
 

  • Being a good consumer of data means adhering to basic research ethics, including ensuring that you don't play a role in transmitting misleading, poorly-derived and/or false data around the world. 

 

Check out the Quality Indicators and More Help sections of this guide to learn more.

 

Further considerations:

 

 

  • Permitted Uses:  Are there licence terms associated with the data you require?  Are you allowed to use the data the way you need to? 

     
  • Access Restrictions:  Are you able to access the data?  And in a timely manner?  For example, some datasets are only available to researchers who have received ethics approvals or are subject to embargoes, which limit access until a set amount of time has elapsed. 

File Types and Statistical Software

Understand the available file types

 

Datasets are generally provided in specific file types that are optimized for data analysis - most typically .csv (compatible with Excel); as well as the file types specific to software analysis programs such as SPSS, Stata, SAS and R. 

 

  • FYI: File types such as PDFs or image files such as jpeg etc will not allow you to analyze/clean/sort/filter and otherwise manipulate the data for your purposes.  If the data are essentially "Read only" can you actually make any use of them?

     
  • Data are occasionally provided in document files, e.g., Word tables - which would require you to copy-paste them into a data-analysis-friendly file format before you can undertake your own analysis tasks

     
    • Be aware of the additional time that will be required to move the data into a new program, and potential for errors to arise during the process.
       

 

Software Considerations
 

 

  • Do you have access to a statistical software program, such as Excel, SPSS, Stata, SAS or R?

     
  • Do you know how to use the relevant program?

     
  • Do you have the time / funds to undertake training in the relevant program?

 

 

For some guidance on statistical programs / data analysis see:

 

 

Cleaning your Data

Key Considerations
 

 

  • Statistical software: If you have a choice of data file formats - consider choosing whichever works with the statistical software program that you best know how to use and already own / can access - even if it doesn't have every possible feature.

     
    • Ask yourself:  do you have the time and resources to acquire and learn a new program?

       
  • Check: are any data fields missing entries?

     
    • Does the accompanying documentation explain what this might signify? e.g. no response? not applicable? question not asked of that respondent?
       
    • Strategize:  what's the best method of dealing with missing entries for your purposes?  Will you simply delete the entire entry?  OR fill in the field using your best guess - aka "imputation."

 

  • are the variable labels, e.g., the column headers, clear and comprehensible? 

 

  • If not, is there an accompanying code book that provides definitions?
     
  • Is there a data sources guide explaining how the data were derived, by whom and when?  Have you read it?

 

  • Is the formatting consistent?  e.g., are dates all entered following the same format? 

     
NOTE: You'll need to decide how you're going to clean the data you'd like to use - and thoroughly document what you did - so that *your* work is reproducible.

 

To learn more about data cleaning see: