Skip to Main Content

Managing your data

Curating Data for Sharing & Archiving

1. Submit to the WURD Repository

2. The WURD director assigns to local or Data Curation Network (DCN) curator

3. Curator reviews the data and documentation following  the DCN CURATED steps

4. Curator sends recommendation to increase FAIRness

5. You send back revisions

6. Curator marks curation complete

7. Work director evaluates FAIRness. Publishes dataset.

8. You receive notice of published dataset with

  • a DOI
  • documentation
  • downloadable dataset
  • DataCite metadata
  • indexing for search engines

9. WURD staff take preservation actions

 

  • Verification, reproduction, and objectivity are a few of the tenets of the scientific method that data sharing directly support.  
  • To fulfill funder requirements.  The NSFNIH, and other public and private funding agencies are now encouraging, if not requiring, data sharing.
  • Increase the impact of your research.  Sharing Detailed Research Data is Associated with Increased Citation Rate (PLoS ONE).
  • Some journals are requiring you make your data available as a condition of publication.

What is curation:

The active and ongoing management of data through its lifecycle of interest and usefulness to scholarship, science, and education. Data curation enables data discovery and retrieval, maintains data quality, adds value, and provides for re-use over time through activities including authentication, archiving, management, preservation, and representation. 

Johnston, Lisa R. Curating Research Data, Volume One: Practical Strategies for Your Digital Repository. 1 edition. Chicago, Illinois: Amer Library Assn, 2017.

What WashU curators do:

  1. Advise on repository selection​

  2. Review data and documentation​

  3. Provide suggestions to improve the FAIRness​

  4. Help you prepare a readme file to accompany your data​

The Office of Science and Technology Policy charged a subgroup to explore and write a report on the Desirable Characteristic of Data Repositories for Federally Funded Research. Below is an overview of the recommendations. For WashU, we can address these characteristics directly, but because there are many options to choose from in the other categories, specificity is more difficult. Here is a desirable characteristics checklist, which you can copy and to help you evaluate whether a data repository meets these characteristics. Check out https://www.re3data.org/ to explore data repositories.

Guidance

Institutional (WashU) General Domain
Free and Easy Access yes varies varies
Clear Use Guidance yes varies varies
Risk Management Yes and.... varies varies
Retention Policy yes varies varies
Long-term Organizational Sustainability yes varies varies
Authentication yes yes yes
Long-term Technical Sustainability yes varies varies
Security and Integrity yes varies varies
Unique Persistent Identifiers yes varies varies
Metadata yes yes yes
Curation/ Quality Assurance yes usually not varies
Broad and Measured Reuse yes varies varies
Common Format yes varies varies
Provenance yes varies varies
Organization Infrastructure
Technology
Digital Object Management

Under the 2023 Data Management and Sharing (DMS) policy, NIH encourages investigators to use an established repository, effective January 25, 2023.

Where does NIH want it shared​

  • Under the 2023 Data Management and Sharing (DMS) policy, NIH encourages investigators to use an established repository.​

  • When selecting a repository, investigators should choose based on factors such as the sensitivity of the data, the size and complexity of the dataset, and the volume of requests anticipated​

​Where does not constitute meeting data sharing sharing requirements​

  • A personal website​

  • A repository you built​

  • “Available on request”​

  • Social networking (e.g., academia.edu)​

Choosing a RepositorySome programs will mandate a specified repositories. For those who don't:

Make a copy of this document based on NIH checklist, and make sure you can check off all of the listed characteristics.

Consider: 

  1. Is there a discipline specific repository that would suit my data?
  2. Look for repositories which meet desirable characteristics and the following guidance:
    • NIH supported Repositories for Sharing Scientific Data.
    • Small datasets (up to 2 GB in size) may be included as supplementary material to accompany articles submitted to PubMed Central (instructions).
    • Data repositories, including generalist repositories or institutional repositories, that make data available to the larger research community, institutions, or the broader public.
    • Large datasets may benefit from cloud-based data repositories for data access, preservation, and sharing.

What should I include in my NIH DMS plan:​

  • Element 1: Data Type​ (describe the data collected, the data to share, the documentation of methods, and metadata)

  • Element 2: Related Tools, Software and/or Code​ (any specialized tools or software should be shared; if possible)

  • Element 3: Standards​ (use community standards for data and metadata (e.g., datacite)

  • Element 4: Data Preservation, Access, and Associated Timelines​ (where it will be shared, how it will be found, when will it be shared, how long will it be retained)

  • Element 5: Access, Distribution, or Reuse Considerations​ (licenses and documentation such as codebooks, data dictionaries, etc.)

  • Element 6: Oversight of Data Management and Sharing​ (roles and repsonsibilities)

How FAIR is your data?

What is FAIR

In 2016, the ‘FAIR Guiding Principles for scientific data management and stewardship’ were published in Scientific Data. The authors intended to provide guidelines to improve the Findability, Accessibility, Interoperability, and Reuse of digital assets. The principles emphasise machine-actionability (i.e., the capacity of computational systems to find, access, interoperate, and reuse data with none or minimal human intervention) because humans increasingly rely on computational support to deal with data as a result of the increase in volume, complexity, and creation speed of data. (Go-FAIR)
 

 

  • Findable​

    • In an indexed repository, with a unique, persistent ID, and rich metadata​

  • Accessible​

    •  Repo uses open, standard protocols so the metadata and data can be accessed​

  • Interoperable​

    •  data are in formal, standard, open application languages​

  • Reusable​

    • well documented, explicit provenance, open licenses, follows community standards​