Creating and Using Research Data
“Will I need this material to support a publication, or validate my research findings?”
“Will this item form part of a finalised data set once my work is complete?”
If the answer is “yes” to either of these questions, this will be part of your research data. Research records will usually need to be kept too, for audit purposes.
Research data can include:
- Recorded outputs of observations, experiments, or simulations
- Lab Books and Logs
- Models created and used to perform simulations and experiments
- Software tools created to capture, analyse, or otherwise use data
- Documentation that describe the project context, methods used, and data outputs produced including email correspondence between collaborators.
Funding bodies usually like to see that the data you gather or create fills a gap in knowledge and require you to demonstrate this. It is often cost effective to re-use data created elsewhere in different ways, perhaps creating a “mash-up” of data from different sources to demonstrate something new. This is attractive to funding bodies, because it means they are not funding the same data gathering exercises twice.
You might also like...
Data Management Planning
A Data Management Plan (DMP) helps researchers and research students with their research methodology. Data Management Planning is an RGU requirement and in many cases it is now becoming a Funder requirement at the point of submission.
A DMP covers the following basics:
Data Management Plans: Examples
Good Practice Tips
Know your legal, ethical and other obligations regarding research data, towards research participants, colleagues, research funders and institutions
- Implement good practices in a consistent manner
- Assign roles and responsibilities to relevant parties in the research
- Design data management according to the needs and purpose of research
- Incorporate data management measures as an integral part of your research cycle
- Implement and review data management throughout research as part of research progression and review
Case Study Examples
Writing a Data Management Plan
In April 2010, the Digital Curation Centre (DCC) launched DMP Online, a web-based tool designed to help researchers and other data stakeholders develop data management plans according to the requirements of major research funders.
Using the tool researchers can create, store and update multiple versions of a data management plan at the grant application stage and during the research cycle. Plans can be customised and exported in various formats. Funder- and institution-specific best practice guidance is available.
The tool combines the DCC’s comprehensive ‘Checklist for a Data Management Plan’ with an analysis of research funder requirements. The DCC is working with partner organisations to include domain- and subject- specific guidance in the tool.
Submitting a Data Management Plan
The Rural Economy and Land Use (RELU) Programme has been at the forefront of implementing data management planning for research projects since 2004. Drawing on best practice in data management and sharing across three research councils (ESRC, NERC and BBSRC), RELU requires that all funded projects develop and implement a Data Management Plan to ensure that data are well managed throughout the duration of a research project. In a data management plan researchers describe:
- the need for access to existing data sources
- data to be produced by the research project
- quality assurance and back-up procedures
- plans for management and archiving of collected data
- expected difficulties in making data available for secondary research and measures to overcome such difficulties
- who holds copyright and Intellectual Property Rights of the data
- who has data management responsibility roles within the research team
The format and software used to create research data depends on the hardware or software used or how researchers plan to analyse data and in some cases by discipline-specific standards and customs.
You might also like...
Good Practice Tips
Good data documentation includes:
- the context of the data, project history, aim, objectives and hypotheses
- data collection methods, sampling, instruments used, hardware and software used, scale and resolution, temporal and geographic coverage and secondary data sources used
- structure of data files, study cases, relationships between files
- data validation, checking, proofing, cleaning and quality assurance procedures carried out
- changes made to data over time since their original creation and identification of different versions of data files
- information on access and use conditions or data confidentiality
Good file naming conventions:
- create meaningful but brief names
- use file names to classify broad types of files
- avoid using spaces and special characters
- avoid very long file names
- Create “readme” files to act as memory aids explaining your file name convention
Best practice to ensure authenticity is to:
- keep a single master file of data
- assign responsibility for master files to a single project team member
- regulate write access to master versions of data files
- record all changes to master files
- maintain old master files in case later ones contain errors
- archive copies of master files at regular intervals
- develop a formal procedure for the destruction of master files
Version Control tips include:
- decide how many versions of a file to keep, which versions to keep, for how long and how to organise versions
- identify milestone versions to keep
- uniquely identify files using a systematic naming convention
- record version and status of a file, e.g. draft, interim, final, internal
- record what changes are made to a file when a new version is created
- record relationships between items where needed, e.g. relationship between code and the data file it is run against; between data file and related documentation or metadata; or between multiple files
- track the location of files if they are stored in a variety of locations
- regularly synchronise files in different locations, e.g. using MS SyncToy software
- maintain single master files in a suitable file format to avoid version control problems associated with multiple working versions of files being developed in parallel
- identify a single location for the storage of milestone and master versions
Case Study Examples
Documenting Data in NVivo
Researchers using qualitative data analysis packages, such as NVivo 9, to analyse data can use a range of the software’s features to describe and document data. Such descriptions both help during analysis and result in essential documentation when data is shared, as they can be exported from the project file alongside data at the end of research. Researchers can create classifications for persons (e.g. interviewees), data sources (e.g. interviews) and coding. Classifications can contain attributes such as the demographic characteristics of interviewees, pseudonyms used, and the date, time and place of interview. If researchers create generic classifications beforehand, attributes can be standardised across all sources or persons throughout the project. Existing template and pre-populated classification sheets can be imported into NVivo.
Documentation files like the methodology description, project plan, interview guidelines and consent form templates can be imported into the NVivo project file and stored in a ‘documentation’ folder in the Memos folder or linked from NVivo 9 externally. Additional documentation about analyses or data manipulations can be created in NVivo as memos. A date- and time-stamped project event log can record all project events carried out during the NVivo project cycle. Additional descriptions can be added to all objects created in, or imported to, the project file such as the project file itself, data, documents, memos, nodes and classifications. All textual documentation compiled during the NVivo project cycle can later be exported as textual files; classifications and event logs can be exported as spreadsheets to document preserved data collections. The structure of the project objects can be exported in groups or individually. Summary information about the project as a whole or groups of objects can be exported via project summary extract reports as a text, MS Excel or XML file.
Online documentation for a data collection in the UK Data Archive Catalogue can include project instructions, questionnaires, technical reports, and user guides. Researchers typically create metadata records for their data by completing a data centre’s data deposit form or metadata editor, or by using a metadata creation tool, like Go-Geo! GeoDoc16 or the UK Location Metadata Editor17. Providing detailed and meaningful dataset titles, descriptions, keywords and other information enables data centres to create rich resource-discovery metadata for archived data collections. Data centres accompany each dataset with a bibliographic citation that users are required to cite in research outputs to reference and acknowledge accurately the data source used. A citation gives credit to the data source and distributor and identifies data sources for validation.
The Wessex Archaeology Metric Archive Project has brought together metric animal bone data from a range of archaeological sites in England into a single database format. The dataset contains a selection of measurements commonly taken during Wessex Archaeology zoo- archaeological analysis of animal bone fragments found during field investigations. It was created by the researchers in MS Excel and MS Access formats and deposited with the Archaeology Data Service (ADS) in the same formats. ADS has preserved the dataset in Oracle and in comma- separated values format (CSV) and disseminates the data via both as an Oracle/Cold Fusion live interface and as downloadable CSV files.
The JISC-funded Data Management for Bio-Imaging project at the John Innes Centre developed Bioformats Converter software to batch convert bio–images from a variety of proprietary microscopy image formats to the Open Microscopy Environment format, OME-TIFF.21 OME-TIFF, an open file format that enables data sharing across platforms, maintains the original image metadata in the file in XML format.
Making back-ups of files is an essential element of data management which protect against accidental or malicious data loss through:
- hardware or software failure
- virus infection or malicious hacking
- human error
It is worthwhile checking that you can recover the files you have backed up. External cloud based storage is a good solution, but double check the security features offered, including recovery of files. If you plan to store any business critical or personal information make sure your chosen method complies with Data Protection legislation and best practice.
Sharing data between collaborators is a challenge. Anything sent by email persists in a number of unknown exchange servers – the sender’s, the receiver’s and others in-between – so relying on this as a method of data transfer is not good practice. Cloud-based or online file sharing services may be suitable for sharing certain types of data, but they are not recommended for data that may be confidential, because users do not control where data is ultimately stored. Researchers should be aware of the risks and benefits of each type of solution so they can make informed decisions about which to use.
Think of the Long Term!
In terms of long term storage of complete data sets once you are ready to publish, RGU library can help you protect, preserve, archive, and share your research data.
Good Practice Tips
- Store data in non-proprietary or open standard formats
- Create digital versions of paper documentation in PDF/a format for long-term preservation and storage
- Often research data and outputs that have been created collaboratively are available via a web site.Although this is an excellent means of disseminating research, data can be particularly vulnerable if the host institution closes the web site.Do not therefore rely on this method as a robust means of securing data.
- Copy or migrate data files to new media between two and five years after they were first created, since both optical and magnetic media are subject to physical degradation
- Consider whether to back-up particular files or the entire computer system (complete system image Check the data integrity of stored data files at regular intervals the frequency of back-up needed, after each change to a data file or at regular interval
- Use a storage strategy, even for a short-term project, with two different forms of storage, e.g. On the cloud and a hard drive strategies for all systems where data are held, including portable computers and devices, non-network computers and home-based computers
- Organise and clearly label stored data so they are easy to locate and physically accessible
- Ensure that areas and rooms for storage of digital or non-digital data are fit for the purpose, structurally sound, and free from the risk of flood and fire
Case Studies and Examples
Data Backup and Storage
A research team carrying out coral reef research collects field data using handheld Personal Digital Assistants (PDAs). Digital data are transmitted daily to the institution’s network drive, where they are held in password-protected files. All data files are identified by an individual version number and creation date. Version information (version numbers and notes detailing differences between versions) is stored in a spreadsheet, also on the network drive. The institution’s network drive is fully backed-up onto Ultrium LTO2 data tapes. Incremental back-ups are made daily Monday to Thursday; full server back-ups are made from Friday to Sunday. Tapes are securely stored in a separate building. Upon completion of the research the data are deposited in the institution’s digital repository.
Survey of Anglo Welsh Dialects
In February 2008 the British Library (BL) received the recorded output of the Survey of Anglo-Welsh Dialects (SAWD), carried out by University College, Swansea, between 1969 and 1995. This survey recorded the English spoken in Wales by interviewing and tape- recording elderly speakers on topics including the farm and farming, the house and housekeeping, nature, animals, social activities and the weather. The collection was deposited in the form of 503 digital audio files, which were accessioned as .wav files in the BL’s Digital Library. Digital clones of all files are held at the Archive of Welsh English, alongside the original master recordings on 151 audio cassettes, from which the digital copies were created.
The BL’s Digital Library is mirrored on four sites – at Boston Spa, St Pancras, Aberystwyth and a ‘dark’ archive which is provided by a third party. Each of these servers has inbuilt integrity checks. The BL makes available access copies for users, in the form of .mp3 audio files, in the British Library Reading Rooms via the Soundserver system. A small set of audio extracts from the SAWD recordings are also available online on the BL’s Accents and Dialects web site, Sounds Familiar