Skip to Main Content

Data Literacies

Guidance on finding, evaluating, and asking meaningful questions with data.

About Data Literacy

The amount of data produced, the democratization of data, and the increase of data tools have increased the need for data literacy among the general population and in organizations. Data literacy means the ability to find, evaluate the source and quality of the data, be able to understand the data, manipulate it, ask questions of it, make an argument from it and assess the arguments of others.

Raul Bhargava from MIT, and Catherine D'ignazio from Emerson College (Knight, 2017) divides data literacy into four components:

Reading the data: Comprehending data in various forms and being able to read the language of data.

Working with the data: People work with data in various forms which depend on the role of the person. Is the person a student, statistician, or data visualization expert? Each of these roles work with data in a different way.

Analyze the Data: Using various skills to analyze the data. The specific skills that are used to analyze the data depend on the goals of the person analyzing the data. This might range from analyzing the data for basic summary statistics to creating machine learning models with the data. 

Argue with the Data: Using the data in order to support your idea or research.

Examples of data literacy skills

  • Evaluating the quality of data sources to be used in analysis.
  • Being able to interpret data visualizations such as histograms or scatter plots.
  • Communicating the results of a data analysis or visualization to a general audience.

The Data Lifecycle

Stage 1: Pre-project

Idea - You have generated an idea, found collaborators and began thinking about what to do.

  • Narrowing the scope of your idea with your collaborators

Planning - This essential stage impacts every other stage.

  • Developing conventions
  • Identify storage - know your requirements, and what the storage offers
  • Keep best practices in mind for your whole project
  • You might consider pre-registering (specifying your research before you start your study and submitting it to a registry) your project in this stage
  • Apply for funding, identify tools, methods, and potential data sources

 

Stage 2: Active

Collection - In this stage, researchers gather data and other materials key to the project.

  • Understanding how to set up tools for accurate collection
  • Being able to identify reputable sources for data finding
  • Understanding the previous methods used to collect or create the data you find
  • Documenting primary (collected by you) and secondary (collected by others)
  • You will make those provisions so that those need to work with them in a structured way.

 

Stage 3: Explore

Wrangle - You prepare the data for analysis

  • Being able to use or reuse data depends on understanding what you have
  • Variable definition, expected values, relationships, units, etc.
  • Document your methods, code, decisions, this is key to understanding and explaining your findings and sharing with others
  • You use tools to look for relationships and structures within data
 

 

Stage 4: Results

Visualize - You create graphical representation of numbers, examine how to communicate the structures

  • Knowing the most appropriate and impactful way to communicate requires an understanding of what different visualization methods do
  • Choosing the appropriate variables in your data for the visualization
  • Understand the power of color and symbols
  • Citing sources used

Interpret - You articulate your preferences; what data relationships tell us

  • Being aware of biases present when interpreting the data
  • Using both writing and visualizations as means of interpretation

 

Stage 5: Post-project

Sharing - You prepare data for long-term access and preservation

  • You prepare data for long-term access and preservation through curation and documentation
  • Data curation includes treatments that help make data FAIR (findable, accessible, interoperable, reusable)
  • Good documentation from other stages can really facilitate this
  • Setting up sharing platforms and permissions to keep data secure
  • Ensuring that you keep raw copies of any data you have before modifying them

Reuse - Your well-documented data can be used in further research

  • Key in this stage is consistent conventions
  • Submitting data to an appropriate institutional or global repository
  • Having metadata and README files which promotes reuse

 

Data Types

  • Sample size: Are the data presented in the study representative of the population at large? Is there a misrepresentation of the sample size?
  • Data collection: How was the data collected? What kind of sampling was used when collecting the data?
  • Data source: Bias and reputation of the source is important to consider. In addition, is the chart created by a source that is in alignment with your ideological beliefs? It is important to check out various sources explaining a specific data trend if possible to prevent a confirmation bias.
  • Are the values absolute values or based on a proportion?: Both are useful, but it is important to put absolute values within the context of the whole population. On the other hand, solely relying on a proportion can potentially minimize absolute values.
  • Model error: For data produced through statistics or  modelling, such as quantitative data, 3D data, or spatial data; high model error can produce values with less accuracy.

Leaping Data Hurdles: a checklist 

 

 

Data management issues checklist

  • Did you give your file a short, meaningful, filename, which follows your conventions? Do you remember or have you documented what that is? Some applications limit the number of characters that can be used in the file path or file name. 
  • Do you have a logical, usable folder structure? Do you remember or have you documented what that is? 
  • Did you navigate to your data folder or geodatabase (never accept default location until you confirm it is right)? If you saved it in Documents/ArcGIS/Default.gdb, it may be lost. 
  • Are your files extracted from a zipped folder before trying to open them in the application? 
  • Are all parts of your shapefile together (don’t delete or move parts of your shapefile) .
  • Did you only work on your geodatabase in ArcGIS (don’t add or remove anything from your geodatabase outside of the application) The geodatabase may look like a folder, but it IS NOT! 

 

Joining data Checklist 

  • Did you remove spaces or special characters from your column headers? 
  • Are your column headers long? Headers that are too long may be truncated later on, especially when working with shapefiles.  A good rule of thumb is 13 characters or fewer. 
  • Are your values exact matches (spelling, spaces, case)? 
  • Do your join fields share the same data type (string, number, etc.)? 
  • Are you joining similar data to other similar data (census tracts to census tracts, etc.)? 
  • Save your table as a comma separated value (.csv) file.  Excel formats can cause issues.  

 

Displaying XY coordinates 

  • Is your X field longitude? 
  • Is your Y field latitude?