Skip to main content

Data Literacies

Guidance on finding, evaluating, and asking meaningful questions with data.

About Data Literacy

The amount of data produced, the democratization of data, and the increase of data tools have increased the need for data literacy among the general population and in organizations. Data literacy means the ability to find, evaluate the source and quality of the data, be able to understand the data, manipulate it, ask questions of it, make an argument from it and assess the arguments of others.

Raul Bhargava from MIT, and Catherine D'ignazio from Emerson College (Knight, 2017) divides data literacy into four components:

Reading the data: Comprehending data in various forms and being able to read the language of data.

Working with the data: People work with data in various forms which depend on the role of the person. Is the person a student, statistician, or data visualization expert? Each of these roles work with data in a different way.

Analyze the Data: Using various skills to analyze the data. The specific skills that are used to analyze the data depend on the goals of the person analyzing the data. This might range from analyzing the data for basic summary statistics to creating machine learning models with the data. 

Argue with the Data: Using the data in order to support your idea or research.

Examples of data literacy skills

  • Evaluating the quality of data sources to be used in analysis.
  • Being able to interpret data visualizations such as histograms or scatter plots.
  • Communicating the results of a data analysis or visualization to a general audience.

The Data Lifecycle

Stage 1: Pre-project

Idea - You have generated an idea, found collaborators and began thinking about what to do.

  • Narrowing the scope of your idea with your collaborators

Planning - This essential stage impacts every other stage.

  • Developing conventions
  • Identify storage - know your requirements, and what the storage offers
  • Keep best practices in mind for your whole project
  • You might consider pre-registering (specifying your research before you start your study and submitting it to a registry) your project in this stage
  • Apply for funding, identify tools, methods, and potential data sources

 

Stage 2: Active

Collection - In this stage, researchers gather data and other materials key to the project.

  • Understanding how to set up tools for accurate collection
  • Being able to identify reputable sources for data finding
  • Understanding the previous methods used to collect or create the data you find
  • Documenting primary (collected by you) and secondary (collected by others)
  • You will make those provisions so that those need to work with them in a structured way.

 

Stage 3: Explore

Wrangle - You prepare the data for analysis

  • Being able to use or reuse data depends on understanding what you have
  • Variable definition, expected values, relationships, units, etc.
  • Document your methods, code, decisions, this is key to understanding and explaining your findings and sharing with others
  • You use tools to look for relationships and structures within data
 

 

Stage 4: Results

Visualize - You create graphical representation of numbers, examine how to communicate the structures

  • Knowing the most appropriate and impactful way to communicate requires an understanding of what different visualization methods do
  • Choosing the appropriate variables in your data for the visualization
  • Understand the power of color and symbols
  • Citing sources used

Interpret - You articulate your preferences; what data relationships tell us

  • Being aware of biases present when interpreting the data
  • Using both writing and visualizations as means of interpretation

 

Stage 5: Post-project

Sharing - You prepare data for long-term access and preservation

  • You prepare data for long-term access and preservation through curation and documentation
  • Data curation includes treatments that help make data FAIR (findable, accessible, interoperable, reusable)
  • Good documentation from other stages can really facilitate this
  • Setting up sharing platforms and permissions to keep data secure
  • Ensuring that you keep raw copies of any data you have before modifying them

Reuse - Your well-documented data can be used in further research

  • Key in this stage is consistent conventions
  • Submitting data to an appropriate institutional or global repository
  • Having metadata and README files which promotes reuse

 

 

Qualitative data includes text, words, ideas, and observed behaviors. In general, those working with qualitative data attempt to describe and interpret human behavior based primarily on the words and actions of selected individuals. Qualitative data tends to be subjective in nature and uses information derived from experiences to support research findings. The analysis of qualitative data can come in many forms including highlighting key words, extracting themes or behaviors, and elaborating on concepts.

 

Quantitative data focuses on numeric data. Quantitative data is percieved to be objective in nature and numbers support research outcomes. Analysis of quantitative data involves statistical techniques and the type of data collected guides the analysis process. 

Examples

  • Raw numbers
  • Percentages
  • Percentiles
  • Mean, median, mode 

Types of Quantitative Data

Ordinal data is a categorical, statistical data type where the variables have natural, ordered categories and the distances between the categories is not known.

Examples

  • Likert scale ("From a scale to 1-5...")
  • Satisfaction, happiness scales

Interval data is ordered data in which the distances between the categories are known, but do not have an absolute zero. The mean, median, and mode can be derived from interval data.

Examples

  • Temperature: Zero degrees Fahrenheit does not mean there is an absence of temperature is just one number on the scale. 
  • Time: Time of day on a 12-hour clock.
  • Test scores 

 

Spatial data must include location variables, in addition to any other data about the location. It is divided into two main branches, vector and raster. Vector data is discrete, made up features with an associated attribute table. Raster data is continuous, made up a grid with values in each cell.

Examples

  • Road data (vector)
  • Aerial Imagery (raster)

 

Digital 3D data (three-dimensional data) is measured in height, width, and depth. It can be digitally captured or created in many different ways, the two main branches of this data type are volumetric and surface data.

 

Chronological data is data which demonstrates or records time. 

  • Rainfall data which records the date and time.
  • Animal movement data which has the date and time of the movements of animals.
  • Sample size: Are the data presented in the study representative of the population at large? Is there a misrepresentation of the sample size?
  • Data collection: How was the data collected? What kind of sampling was used when collecting the data?
  • Data source: Bias and reputation of the source is important to consider. In addition, is the chart created by a source that is in alignment with your ideological beliefs? It is important to check out various sources explaining a specific data trend if possible to prevent a confirmation bias.
  • Are the values absolute values or based on a proportion?: Both are useful, but it is important to put absolute values within the context of the whole population. On the other hand, solely relying on a proportion can potentially minimize absolute values.
  • Model error: For data produced through statistics or  modelling, such as quantitative data, 3D data, or spatial data; high model error can produce values with less accuracy.