Creating and Using Research Data
Understanding the difference between research “data” and research “records” is often the first hurdle.
“Will I need this material to support a publication, or validate my research findings?”
“Will this item form part of a finalised data set once my work is complete?”
If the answer is “yes” to either of these questions, this will be part of your research data. Research records will usually need to be kept too, for audit purposes.
Research data can include:
- Recorded outputs of observations, experiments, or simulations
- Lab Books and Logs
- Models created and used to perform simulations and experiments
- Software tools created to capture, analyse, or otherwise use data
- Documentation that describe the project context, methods used, and data outputs produced including email correspondence between collaborators.
Funding bodies usually like to see that the data you gather or create fills a gap in knowledge and require you to demonstrate this. It is often cost effective to re-use data created elsewhere in different ways, perhaps creating a “mash-up” of data from different sources to demonstrate something new. This is attractive to funding bodies, because it means they are not funding the same data gathering exercises twice.
Many public funding research bodies and publishers are now requiring that data is publicly available. You need to understand the terms of your funding agreement before you start, to make sure you take this into account.
You might also like...
Association of Medical Research Charities
Concordat on Open Research Data
MANTRA - Online Research Data Training Resources (University of Edinburgh)
Data Management Planning
A Data Management Plan (DMP) helps researchers and research students with their research methodology. Data Management Planning is an RGU requirement and in many cases it is now becoming a Funder requirement at the point of submission.
A DMP covers the following basics:
- description, format and volume of data
- data storage and back-up measures
- data management roles and responsibilities
- infrastructure, costing or resources needed
- plans for sharing data including ethical and legal issues or restrictions on data sharing
- copyright and intellectual property rights of data
To help researchers, templates are available via the DMPonline tool. The tool includes video tuition and RGU users can login using their institutional credentials. Researchers who plan to submit to a funder where there is no prepared template can still use this tool, which will provide a standard simple template. Research students can also use this template for planning data handling during their studies.
Workshops on data management planning and data handling are held regularly throughout the academic year.
Data Management Plans: Examples
A Guide to Writing a Wellcome Trust Data Management Plan
Good Practice Tips
Know your legal, ethical and other obligations regarding research data, towards research participants, colleagues, research funders and institutions
- Implement good practices in a consistent manner
- Assign roles and responsibilities to relevant parties in the research
- Design data management according to the needs and purpose of research
- Incorporate data management measures as an integral part of your research cycle
- Implement and review data management throughout research as part of research progression and review
Case Study Examples
Writing a Data Management Plan
In April 2010, the Digital Curation Centre (DCC) launched DMP Online, a web-based tool designed to help researchers and other data stakeholders develop data management plans according to the requirements of major research funders.
Using the tool researchers can create, store and update multiple versions of a data management plan at the grant application stage and during the research cycle. Plans can be customised and exported in various formats. Funder- and institution-specific best practice guidance is available.
The tool combines the DCC’s comprehensive ‘Checklist for a Data Management Plan’ with an analysis of research funder requirements. The DCC is working with partner organisations to include domain- and subject- specific guidance in the tool.
Submitting a Data Management Plan
The Rural Economy and Land Use (RELU) Programme has been at the forefront of implementing data management planning for research projects since 2004. Drawing on best practice in data management and sharing across three research councils (ESRC, NERC and BBSRC), RELU requires that all funded projects develop and implement a Data Management Plan to ensure that data are well managed throughout the duration of a research project. In a data management plan researchers describe:
- the need for access to existing data sources
- data to be produced by the research project
- quality assurance and back-up procedures
- plans for management and archiving of collected data
- expected difficulties in making data available for secondary research and measures to overcome such difficulties
- who holds copyright and Intellectual Property Rights of the data
- who has data management responsibility roles within the research team
Formatting
The format and software used to create research data depends on the hardware or software used or how researchers plan to analyse data and in some cases by discipline-specific standards and customs.
Despite the backward compatibility of many software packages to import data created in previous software versions, the safest option to guarantee long-term data access is to convert data to standard formats.
Keep it Organised!
Well-organised file names and folder structures make it easier to find and keep track of data files. Develop a system that works for your project and use it consistently. Whilst computers add basic information and properties to a file, this is not reliable data management. It is better to record essential information in file names or through the folder structure. Think carefully how best to structure files in folders, in order to make it easy to locate and organise files and versions. When working in collaboration the need for an orderly structure is even higher.
Keep Track of Changes and Locations
It is important to ensure that different versions of files, related files held in different locations, and information that is cross-referenced between files are all subject to version control. It can be difficult to locate a correct version or to know how versions differ after some time has elapsed.
It is important to keep track of master versions of files, for example the latest iteration, especially where data files are shared between people or locations, e.g. on both a PC and a laptop. Checks and procedures may also need to be put in place to make sure that if the information in one file is altered, the related information in other files is also updated.
Because digital information can be copied or altered so easily, it is important to be able to demonstrate the authenticity of data and to be able to prevent unauthorised access to data that may potentially lead to unauthorised changes.
Ensure Good Quality Control!
Quality control of data is an integral part of all research and takes place at various stages. It is important to assign clear roles and responsibilities for data quality assurance at all stages of research and to develop suitable procedures before data gathering starts. Quality control measures during data collection may include:
- calibration of instruments
- checking the truth of the record with an expert
- using standardised methods and protocols for capturing observations
- computer-assisted interview software standardise interviews and verify response consistency
- checking data completeness
- verifying random samples of the digital data against the original data
- statistical analyses such as frequencies, means, to detect errors and anomalous values
- peer review
Good quality and consistent transcription conventions include transcription instructions or guidelines and a template to ensure uniformity across a collection. Full transcription is recommended for data sharing. If transcription is outsourced take care with:
- data security when transmitting data between researcher and transcriber
- data security procedures for the transcriber to follow
- a non-disclosure agreement for the transcriber
- transcriber instructions or guidelines, indicating required transcription style, layout and editing
Transcripts should:
- have a unique identifier that labels an interview either through a name or number
- have a uniform layout throughout a research project or data collection
- use speaker tags to indicate turn-taking or question/answer sequence in conversations
- carry line breaks between turn-takes
- be page numbered
- have a document cover sheet or header with brief interview or event details such as date, place, interviewer name, interviewee details
Include Data Documentation and Metadata
Data documentation explains how data was created, what it means, content and structure. It is part of good practice when creating, organising and managing data and is important to create sufficient contextual information to make sense of the data. Documentation may include:
- names, labels and descriptions for variables, records and their values
- explanation or definition of codes and classification schemes used
- definitions of specialist terminology or acronyms used
- codes of, and reasons for, missing values
- derived data created after collection, with code, algorithm or command file
- weighting and grossing variables created
- data listing of annotations for cases, individuals or items
Metadata is the label attached to data to describe it. It is extremely important, because most people will forget the details of what a data file or data set contains. Typically metadata will include information on
- WHAT was collected
- HOW it was collected
- WHEN the data was collected
- WHO collected it
- WHAT format was used
You might also like...
JISC Guide to Managing Digital Media
JISC Guidance on Digital File Formats
Good Practice Tips
Good data documentation includes:
- the context of the data, project history, aim, objectives and hypotheses
- data collection methods, sampling, instruments used, hardware and software used, scale and resolution, temporal and geographic coverage and secondary data sources used
- structure of data files, study cases, relationships between files
- data validation, checking, proofing, cleaning and quality assurance procedures carried out
- changes made to data over time since their original creation and identification of different versions of data files
- information on access and use conditions or data confidentiality
Good file naming conventions:
- create meaningful but brief names
- use file names to classify broad types of files
- avoid using spaces and special characters
- avoid very long file names
- Create “readme” files to act as memory aids explaining your file name convention
Best practice to ensure authenticity is to:
- keep a single master file of data
- assign responsibility for master files to a single project team member
- regulate write access to master versions of data files
- record all changes to master files
- maintain old master files in case later ones contain errors
- archive copies of master files at regular intervals
- develop a formal procedure for the destruction of master files
Version Control tips include:
- decide how many versions of a file to keep, which versions to keep, for how long and how to organise versions
- identify milestone versions to keep
- uniquely identify files using a systematic naming convention
- record version and status of a file, e.g. draft, interim, final, internal
- record what changes are made to a file when a new version is created
- record relationships between items where needed, e.g. relationship between code and the data file it is run against; between data file and related documentation or metadata; or between multiple files
- track the location of files if they are stored in a variety of locations
- regularly synchronise files in different locations, e.g. using MS SyncToy software
- maintain single master files in a suitable file format to avoid version control problems associated with multiple working versions of files being developed in parallel
- identify a single location for the storage of milestone and master versions
Case Study Examples
Documenting Data in NVivo
Researchers using qualitative data analysis packages, such as NVivo 9, to analyse data can use a range of the software’s features to describe and document data. Such descriptions both help during analysis and result in essential documentation when data is shared, as they can be exported from the project file alongside data at the end of research. Researchers can create classifications for persons (e.g. interviewees), data sources (e.g. interviews) and coding. Classifications can contain attributes such as the demographic characteristics of interviewees, pseudonyms used, and the date, time and place of interview. If researchers create generic classifications beforehand, attributes can be standardised across all sources or persons throughout the project. Existing template and pre-populated classification sheets can be imported into NVivo.
Documentation files like the methodology description, project plan, interview guidelines and consent form templates can be imported into the NVivo project file and stored in a ‘documentation’ folder in the Memos folder or linked from NVivo 9 externally. Additional documentation about analyses or data manipulations can be created in NVivo as memos. A date- and time-stamped project event log can record all project events carried out during the NVivo project cycle. Additional descriptions can be added to all objects created in, or imported to, the project file such as the project file itself, data, documents, memos, nodes and classifications. All textual documentation compiled during the NVivo project cycle can later be exported as textual files; classifications and event logs can be exported as spreadsheets to document preserved data collections. The structure of the project objects can be exported in groups or individually. Summary information about the project as a whole or groups of objects can be exported via project summary extract reports as a text, MS Excel or XML file.
Data Documentation
Online documentation for a data collection in the UK Data Archive Catalogue can include project instructions, questionnaires, technical reports, and user guides. Researchers typically create metadata records for their data by completing a data centre’s data deposit form or metadata editor, or by using a metadata creation tool, like Go-Geo! GeoDoc16 or the UK Location Metadata Editor17. Providing detailed and meaningful dataset titles, descriptions, keywords and other information enables data centres to create rich resource-discovery metadata for archived data collections. Data centres accompany each dataset with a bibliographic citation that users are required to cite in research outputs to reference and acknowledge accurately the data source used. A citation gives credit to the data source and distributor and identifies data sources for validation.
File Formatting
The Wessex Archaeology Metric Archive Project has brought together metric animal bone data from a range of archaeological sites in England into a single database format. The dataset contains a selection of measurements commonly taken during Wessex Archaeology zoo- archaeological analysis of animal bone fragments found during field investigations. It was created by the researchers in MS Excel and MS Access formats and deposited with the Archaeology Data Service (ADS) in the same formats. ADS has preserved the dataset in Oracle and in comma- separated values format (CSV) and disseminates the data via both as an Oracle/Cold Fusion live interface and as downloadable CSV files.
File Conversions
The JISC-funded Data Management for Bio-Imaging project at the John Innes Centre developed Bioformats Converter software to batch convert bio–images from a variety of proprietary microscopy image formats to the Open Microscopy Environment format, OME-TIFF.21 OME-TIFF, an open file format that enables data sharing across platforms, maintains the original image metadata in the file in XML format.
Storing
You’ve invested a lot of time and effort in creating your data, so keep it safe. Throughout the life of your project you need to continuously think about solutions for storing data carefully. Many forms of storage media are inherently unreliable, and all file formats and physical storage media will ultimately become obsolete.
Back-up!
Making back-ups of files is an essential element of data management which protect against accidental or malicious data loss through:
- hardware or software failure
- virus infection or malicious hacking
- human error
It is worthwhile checking that you can recover the files you have backed up. External cloud based storage is a good solution, but double check the security features offered, including recovery of files. If you plan to store any business critical or personal information make sure your chosen method complies with Data Protection legislation and best practice.
Share!
Sharing data between collaborators is a challenge. Anything sent by email persists in a number of unknown exchange servers – the sender’s, the receiver’s and others in-between – so relying on this as a method of data transfer is not good practice. Cloud-based or online file sharing services may be suitable for sharing certain types of data, but they are not recommended for data that may be confidential, because users do not control where data is ultimately stored. Researchers should be aware of the risks and benefits of each type of solution so they can make informed decisions about which to use.
Think of the Long Term!
In terms of long term storage of complete data sets once you are ready to publish, RGU library can help you protect, preserve, archive, and share your research data.
All research activity associated with RGU is an asset of the University and so RGU has a responsibility to secure, store and access all research data, within the bounds of any IP or confidentiality agreement.
To ensure this, RGU is providing R:\drives for researchers, including research students. These provide additional basic data storage space, which can be shared with named individuals who have an RGU login e.g. PIs, research team members, research students and supervisors. They do not provide additional processing or compute power.
Research students will have an R:\drive created for them shortly after they commence their studies, typically when they have completed Module 1 of PGCert.
Data is held securely and privately and so the R:\drive is ideal for confidential or sensitive data. The R:\drive can be accessed via Citrix remotely in the same way as H:\drives.
Good Practice Tips
- Store data in non-proprietary or open standard formats
- Create digital versions of paper documentation in PDF/a format for long-term preservation and storage
- Often research data and outputs that have been created collaboratively are available via a web site.Although this is an excellent means of disseminating research, data can be particularly vulnerable if the host institution closes the web site.Do not therefore rely on this method as a robust means of securing data.
- Copy or migrate data files to new media between two and five years after they were first created, since both optical and magnetic media are subject to physical degradation
- Consider whether to back-up particular files or the entire computer system (complete system image Check the data integrity of stored data files at regular intervals the frequency of back-up needed, after each change to a data file or at regular interval
- Use a storage strategy, even for a short-term project, with two different forms of storage, e.g. On the cloud and a hard drive strategies for all systems where data are held, including portable computers and devices, non-network computers and home-based computers
- Organise and clearly label stored data so they are easy to locate and physically accessible
- Ensure that areas and rooms for storage of digital or non-digital data are fit for the purpose, structurally sound, and free from the risk of flood and fire
Case Studies and Examples
Data Backup and Storage
A research team carrying out coral reef research collects field data using handheld Personal Digital Assistants (PDAs). Digital data are transmitted daily to the institution’s network drive, where they are held in password-protected files. All data files are identified by an individual version number and creation date. Version information (version numbers and notes detailing differences between versions) is stored in a spreadsheet, also on the network drive. The institution’s network drive is fully backed-up onto Ultrium LTO2 data tapes. Incremental back-ups are made daily Monday to Thursday; full server back-ups are made from Friday to Sunday. Tapes are securely stored in a separate building. Upon completion of the research the data are deposited in the institution’s digital repository.
Survey of Anglo Welsh Dialects
In February 2008 the British Library (BL) received the recorded output of the Survey of Anglo-Welsh Dialects (SAWD), carried out by University College, Swansea, between 1969 and 1995. This survey recorded the English spoken in Wales by interviewing and tape- recording elderly speakers on topics including the farm and farming, the house and housekeeping, nature, animals, social activities and the weather. The collection was deposited in the form of 503 digital audio files, which were accessioned as .wav files in the BL’s Digital Library. Digital clones of all files are held at the Archive of Welsh English, alongside the original master recordings on 151 audio cassettes, from which the digital copies were created.
The BL’s Digital Library is mirrored on four sites – at Boston Spa, St Pancras, Aberystwyth and a ‘dark’ archive which is provided by a third party. Each of these servers has inbuilt integrity checks. The BL makes available access copies for users, in the form of .mp3 audio files, in the British Library Reading Rooms via the Soundserver system. A small set of audio extracts from the SAWD recordings are also available online on the BL’s Accents and Dialects web site, Sounds Familiar