Skip to Main Content

Managing your data

Curating Data for Sharing & Archiving

  • Verification, reproduction, and objectivity are a few of the tenets of the scientific method that data sharing directly support.  
  • To fulfill funder requirements.  The NSFNIH, and other public and private funding agencies are now encouraging, if not requiring, data sharing.
  • Increase the impact of your research.  Sharing Detailed Research Data is Associated with Increased Citation Rate (PLoS ONE).
  • Some journals are requiring you make your data available as a condition of publication.

What is curation:

The active and ongoing management of data through its lifecycle of interest and usefulness to scholarship, science, and education. Data curation enables data discovery and retrieval, maintains data quality, adds value, and provides for re-use over time through activities including authentication, archiving, management, preservation, and representation. 

Johnston, Lisa R. Curating Research Data, Volume One: Practical Strategies for Your Digital Repository. 1 edition. Chicago, Illinois: Amer Library Assn, 2017.

What WashU curators do:

  1. Advise on repository selection​

  2. Review data and documentation​

  3. Provide suggestions to improve the FAIRness​

  4. Help you prepare a readme file to accompany your data​

WashU Curation Workflow:

You can get started with WashU Curators.

1. By requesting a consultation. In a consultation we can:

  • discuss your data needs
  • help you plan your data preparation
  • help you chose a repository
  • help you submit data

2. By submitting data in our repository. This automatically initiates curation:

  • Administrators review your submission 
  • Issue a DOI
  • Send the dataset to a curator (local or DCN) how follows the DCN CURATED steps
    • Check the data and documentation, create a file manifest
    • Understand the metadata completeness and data functioning, variable descriptions, contextual information, etc.
    • Request information / provide recommendations
    • Augment the the data and documentation to increase fairness
      • Create an interoperaple, plain text README so it is unlikely to ever be obsolete​
        • Provides structured information about the data, project and creator​

        • It helps re-users understand the limitations of the dataset​

        • It helps re-users understand how to use the data appropriately​

        • It’s a place to collect all the relevant identifiers to help connect the data to authors, published results, institutions and funders

    • Transform to open formats, where possible
    • Evaluate the dataset against FAIR principles
    • Document the process and save it in an archival package along with the a redundant copy of the original and and improved data and documentation
  • Publish the dataset in our Repository with:
    • Associated with DOI
    • Robust metadata
    • The appropriate licensing
    • Authors and institutional identification
    • 10 year retention / with review at that milestone
    • A preservation workflow in place

 

The Office of Science and Technology Policy charged a subgroup to explore and write a report on the Desirable Characteristic of Data Repositories for Federally Funded Research. Below is an overview of the recommendations. For WashU, we can address these characteristics directly, but because there are many options to choose from in the other categories, specificity is more difficult. Here is a desirable characteristics checklist, which you can copy and to help you evaluate whether a data repository meets these characteristics. Check out https://www.re3data.org/ to explore data repositories.

Guidance

Institutional (WashU) General Domain
Free and Easy Access yes varies varies
Clear Use Guidance yes varies varies
Risk Management Yes and.... varies varies
Retention Policy yes varies varies
Long-term Organizational Sustainability yes varies varies
Authentication yes yes yes
Long-term Technical Sustainability yes varies varies
Security and Integrity yes varies varies
Unique Persistent Identifiers yes varies varies
Metadata yes yes yes
Curation/ Quality Assurance yes usually not varies
Broad and Measured Reuse yes varies varies
Common Format yes varies varies
Provenance yes varies varies
Organization Infrastructure
Technology
Digital Object Management

Under the 2023 Data Management and Sharing (DMS) policy, NIH encourages investigators to use an established repository, effective January 25, 2023.

Where does NIH want it shared​

  • Under the 2023 Data Management and Sharing (DMS) policy, NIH encourages investigators to use an established repository.​

  • When selecting a repository, investigators should choose based on factors such as the sensitivity of the data, the size and complexity of the dataset, and the volume of requests anticipated​

​Where does not constitute meeting data sharing sharing requirements​

  • A personal website​

  • A repository you built​

  • “Available on request”​

  • Social networking (e.g., academia.edu)​

Choosing a RepositorySome programs will mandate a specified repositories. For those who don't:

Make a copy of this document based on NIH checklist, and make sure you can check off all of the listed characteristics.

Consider: 

  1. Is there a discipline specific repository that would suit my data?
  2. Look for repositories which meet desirable characteristics and the following guidance:
    • NIH supported Repositories for Sharing Scientific Data.
    • Small datasets (up to 2 GB in size) may be included as supplementary material to accompany articles submitted to PubMed Central (instructions).
    • Data repositories, including generalist repositories or institutional repositories, that make data available to the larger research community, institutions, or the broader public.
    • Large datasets may benefit from cloud-based data repositories for data access, preservation, and sharing.

What should I include in my NIH DMS plan:​

  • Element 1: Data Type​ (describe the data collected, the data to share, the documentation of methods, and metadata)

  • Element 2: Related Tools, Software and/or Code​ (any specialized tools or software should be shared; if possible)

  • Element 3: Standards​ (use community standards for data and metadata (e.g., datacite)

  • Element 4: Data Preservation, Access, and Associated Timelines​ (where it will be shared, how it will be found, when will it be shared, how long will it be retained)

  • Element 5: Access, Distribution, or Reuse Considerations​ (licenses and documentation such as codebooks, data dictionaries, etc.)

  • Element 6: Oversight of Data Management and Sharing​ (roles and repsonsibilities)

How FAIR is your data?