Research Guides: Data Literacies: About Data Literacy

About Data Literacy

The amount of data produced, the democratization of data, and the increase of data tools have increased the need for data literacy among the general population and in organizations. Data literacy means the ability to find, evaluate the source and quality of the data, be able to understand the data, manipulate it, ask questions of it, make an argument from it and assess the arguments of others.

Raul Bhargava from MIT, and Catherine D'ignazio from Emerson College (Knight, 2017) divides data literacy into four components:

Reading the data: Comprehending data in various forms and being able to read the language of data.

Working with the data: People work with data in various forms which depend on the role of the person. Is the person a student, statistician, or data visualization expert? Each of these roles work with data in a different way.

Analyze the Data: Using various skills to analyze the data. The specific skills that are used to analyze the data depend on the goals of the person analyzing the data. This might range from analyzing the data for basic summary statistics to creating machine learning models with the data.

Argue with the Data: Using the data in order to support your idea or research.

Examples of data literacy skills

Evaluating the quality of data sources to be used in analysis.
Being able to interpret data visualizations such as histograms or scatter plots.
Communicating the results of a data analysis or visualization to a general audience.

The Data Lifecycle

Stage 1: Pre-project

Idea - You have generated an idea, found collaborators and began thinking about what to do.

Narrowing the scope of your idea with your collaborators

Planning - This essential stage impacts every other stage.

Developing conventions
Identify storage - know your requirements, and what the storage offers
Keep best practices in mind for your whole project
You might consider pre-registering (specifying your research before you start your study and submitting it to a registry) your project in this stage
Apply for funding, identify tools, methods, and potential data sources

Stage 2: Active

Collection - In this stage, researchers gather data and other materials key to the project.

Understanding how to set up tools for accurate collection
Being able to identify reputable sources for data finding
Understanding the previous methods used to collect or create the data you find
Documenting primary (collected by you) and secondary (collected by others)
You will make those provisions so that those need to work with them in a structured way.

Stage 3: Explore

Wrangle - You prepare the data for analysis

Being able to use or reuse data depends on understanding what you have
Variable definition, expected values, relationships, units, etc.
Document your methods, code, decisions, this is key to understanding and explaining your findings and sharing with others
You use tools to look for relationships and structures within data

Stage 4: Results

Visualize - You create graphical representation of numbers, examine how to communicate the structures

Knowing the most appropriate and impactful way to communicate requires an understanding of what different visualization methods do
Choosing the appropriate variables in your data for the visualization
Understand the power of color and symbols
Citing sources used

Interpret - You articulate your preferences; what data relationships tell us

Being aware of biases present when interpreting the data
Using both writing and visualizations as means of interpretation

Stage 5: Post-project

Sharing - You prepare data for long-term access and preservation

You prepare data for long-term access and preservation through curation and documentation
Data curation includes treatments that help make data FAIR (findable, accessible, interoperable, reusable)
Good documentation from other stages can really facilitate this
Setting up sharing platforms and permissions to keep data secure
Ensuring that you keep raw copies of any data you have before modifying them

Reuse - Your well-documented data can be used in further research

Key in this stage is consistent conventions
Submitting data to an appropriate institutional or global repository
Having metadata and README files which promotes reuse

Data Types

Sample size: Are the data presented in the study representative of the population at large? Is there a misrepresentation of the sample size?
Data collection: How was the data collected? What kind of sampling was used when collecting the data?
Data source: Bias and reputation of the source is important to consider. In addition, is the chart created by a source that is in alignment with your ideological beliefs? It is important to check out various sources explaining a specific data trend if possible to prevent a confirmation bias.
Are the values absolute values or based on a proportion?: Both are useful, but it is important to put absolute values within the context of the whole population. On the other hand, solely relying on a proportion can potentially minimize absolute values.
Model error: For data produced through statistics or modelling, such as quantitative data, 3D data, or spatial data; high model error can produce values with less accuracy.

Leaping Data Hurdles: A Checklist

Data management issues checklist

Did you give your file a short, meaningful, filename, which follows your conventions? Do you remember or have you documented what that is? Some applications limit the number of characters that can be used in the file path or file name.
Do you have a logical, usable folder structure? Do you remember or have you documented what that is?
Did you navigate to your data folder or geodatabase (never accept default location until you confirm it is right)? If you saved it in Documents/ArcGIS/Default.gdb, it may be lost.
Are your files extracted from a zipped folder before trying to open them in the application?
Are all parts of your shapefile together (don’t delete or move parts of your shapefile) .
Did you only work on your geodatabase in ArcGIS (don’t add or remove anything from your geodatabase outside of the application) The geodatabase may look like a folder, but it IS NOT!

Joining Data Checklist

Did you remove spaces or special characters from your column headers?
Are your column headers long? Headers that are too long may be truncated later on, especially when working with shapefiles. A good rule of thumb is 13 characters or fewer.
Are your values exact matches (spelling, spaces, case)?
Do your join fields share the same data type (string, number, etc.)?
Are you joining similar data to other similar data (census tracts to census tracts, etc.)?
Save your table as a comma separated value (.csv) file. Excel formats can cause issues.

Displaying XY Coordinates

Is your X field longitude?
Is your Y field latitude?

Data Literacies

Related LibGuides

Data Curator