Preserving and Sharing your Data

Books in rolling shelves in a library

As your research project draws to an end, it’s important  to consider how you are going to preserve your data so that it remains usable in the long-term. If there are no barriers to making your data openly available, you should think about how you are going to share it with others.

There are many benefits to sharing data- not only does it allow the results of your research to be validated, but it also provides opportunities for new projects and collaborations with others. Many funders require data to be shared as it helps to maximise the value of the data for the benefit of research and the wider community. Publishers too are increasingly asking authors to provide details of how the data underpinning articles can be accessed.  

The University’s Research Data Management Policy highlights the institution’s expectations for data sharing.

This page covers a range of topics to help preserve and share your data as you complete your research project. At this point you should have stopped working on your data and there should be no more changes to your data. For more on managing your data during the active phase of your research project please see the guidance here.

Preserving data needs to be guided by the long-term needs of your data which differs from your day to day needs during the active phase of research. There shouldn’t be a need to keep all of the data you have generated, so you will need to think about which data you are going to keep and for how long.

Key things you’ll need to consider when assessing data for preservation include:

  • Legal and ethical requirements
  • Funder, publisher or institutional policies
  • The reuse value of the data
  • Significance of the data in scientific, historical or cultural terms
  • The uniqueness of the data and how difficult it is to recreate

Guidance on appraising data for preservation developed by the Digital Curation Centre can help you decide which data to keep: ‘Five steps to decide what data to keep: a checklist for appraising research’.

You will also need to think about how you are going to dispose of any data you do not wish to keep. Any confidential material should be disposed in accordance with the university’s confidential waste disposal procedure.

During the active phase of your research the format of your data files may be determined by the software that you are using for collection or analysis. But once this stage is over you’ll need to save it in file formats that ensure your data remains usable and accessible in the long term.

The file formats you use should ideally be:

  • Guided by common practice within your discipline
  • Non-proprietary (as proprietary formats are often not interoperable and more likely to become obsolete)
  • Unencrypted and uncompressed
  • Use an open documented standard (e.g. ASCII or Unicode)

A list of file formats recommended by the UK Data Service for data sharing, reuse and preservation can be found here.

Three people sharing a cake in a coffee shop.

To share or not to share your data? 

After selecting the data you are going to preserve you can decide whether or not you can share your data with others.  

Whilst sharing data has many benefits, there may be legitimate reasons why it cannot be shared: there may be confidential, personal or sensitive data, the data could be subject to intellectual property rights, or there may be commercial interests that prevent sharing. Useful information about GDPR and research can be found here.

There aren’t any hard and fast rules when it comes to sharing data. If your data contain personal information you may be able to anonymise your datasets and then share them. Similarly, it may be possible to redact sensitive material from your data which means it can be shared. Consider what is suitable for your data so that it is ‘as open as possible, as closed as necessary’.

Depending on the nature of your data and any permissions (or lack of) for sharing, you may choose one of three levels of access for your data: open (freely accessible to all), closed (under a temporary or permanent embargo) or restricted (available under particular conditions, e.g. to bona fide researchers on request). For guidance on finding the right level of access for your data please contact the Scholarly Communications Team.

There are many ways that researchers share their data with others.

Repositories

Repositories provide a secure space for research data. They manage access controls to data and provide permanent identifiers so that data can be identified and cited easily. Repositories preserve data for the long-term and ensure they are findable by adding metadata.

Data journals

Data journals publish articles focused on the data underlying research studies and can also include a link to the datasets themselves. Data papers can be cited and thus provide additional credit to researchers.

Supplementary information accompanying journal articles

Data can be shared alongside journal articles as supplementary information so that readers can easily access the data underpinning the article. Many funders and publishers require authors to complete a data access statement when publishing journal articles. Although good for readers there may be issues with long-term preservation and wider access to the data if sharing solely via this method.

Websites

Data can be shared on project websites which makes it potentially accessible to a wide audience. Long-term preservation is an issue as websites need to be maintained and many fall into disuse once projects are complete and researchers have moved on. For this reason, use of a repository is also recommended.  

E-mailed on request

Some researchers make their data available to individual researchers on request. Ideally this should not be your method for sharing data as it relies on potential users being able to contact you which can be difficult after a number of years. It can also be hard to define and agree terms for re-use via this method, which can be particularly important if dealing with sensitive data.

 

For the purposes of sharing data easily and securely, the best way to share your data is via a trusted repository. They are the best way to ensure your data are FAIR (findable, accessible, interoperable, and reusable). You can read more about the FAIR principles for research data in this blogpost.

There are many types of data repository where you might deposit your data. If you are in receipt of funding from one of the UK research councils or another funder you should check their requirements for data sharing. Some funders mandate the deposit of data in particular repositories (e.g. ESRC requires data to be deposited with the UK Data Service, similarly NERC has a network of data centres) so you should check this beforehand. Wellcome Open Research maintains a list of approved repositories which is also helpful for non-funded researchers.

Before depositing your data you should evaluate the repository for suitability to ensure that it meets the needs of your data. Among the things to consider are the reputation of the repository, reuse licence options, and length of time your data will be stored for. The Digital Curation Centre provides a useful checklist that can help you choose the right repository for your data.

 

Disciplinary repositories

You may wish to deposit your data in a repository that contains datasets on a similar subject to your own. The following databases can help you locate a suitable disciplinary repository:

Re3data

Fairsharing.org

Alternatively you may know of suitable disciplinary repositories from your academic networks.

 

Generalist repositories

Some repositories take data from all disciplines. Repositories that you might consider include Zenodo and Figshare.

When you deposit your data in a repository, descriptive metadata (data about data) about your datasets will be generated and exposed by the repository. The purpose of metadata is to facilitate machine-reading which helps potential users locate your data via databases or search engines. Repositories use international standards for metadata (they tend to apply these automatically) and these vary according to discipline. You can find standards relevant to your discipline by consulting the Metadata Standards Catalog, an open directory of metadata standards that can be applied to research data.   

If you are going to be making your research data openly available to others via a repository you’ll need to think about applying a licence to your data. Licences specify what others can or cannot do with your data without the need to contact you directly for permission. Repositories apply licences to the datasets they hold and you can select an appropriate licence when you deposit your data.

Creative Commons licences are the most commonly used licences for research data internationally.

Other licencing schemes include Open Data Commons for databases and the Open Government licence for UK Government data. If you are working with software then you’ll need to consider software specific licences, e.g. OSI approved licences. This License Selector is a useful tool for exploring data and software licence options.

Further help and information

If you need any information or guidance on any preservation or sharing issues relating to your data, please get in touch with the Scholarly Communications Team at wire@wlv.ac.uk 

 

Images used on this page

Photo by C M on Unsplash 

Photo by Raw Pixel under a CC0 licence