Research data management explained

What is research data?

What is research data?
Why manage research data?

Research data lifecycle

Research data is any information that has been collected, observed, generated or created to validate original research findings.

Although usually digital, research data also includes non-digital formats such as laboratory notebooks and diaries.

Types of research data

Research data can take many forms. It might be:

documents, spreadsheets
laboratory notebooks, field notebooks, diaries
questionnaires, transcripts, codebooks
audiotapes, videotapes
photographs, films
test responses
slides, artefacts, specimens, samples
collections of digital outputs
data files
database contents (video, audio, text, images)
models, algorithms, scripts
contents of an application (input, output, logfiles for analysis software, simulation software, schemas)
methodologies and workflows
standard operating procedures and protocols

Non-digital data

Non-digital data such as laboratory notebooks, ice-core samples and sketchbooks is often unique. You should assess the long-term value of any non-digital data and plan how you will describe and retain them.

You could digitise the materials, but this may not be possible for all types of data.

The University of Leeds research data repository (Research Data Leeds) describes digital materials and can also be used to create records for physical artefacts.

Please contact the team if you would like to discuss requirements for non-digital data.

Sources of research data

Research data can be generated for different purposes and through different processes.

Observational data is captured in real-time, and is usually irreplaceable, for example sensor data, survey data, sample data, and neuro-images.
Experimental data is captured from lab equipment. It is often reproducible, but this can be expensive. Examples of experimental data are gene sequences, chromatograms, and toroid magnetic field data.
Simulation data is generated from test models where model and metadata are more important than output data. For example, climate models and economic models.
Derived or compiled data has been transformed from pre-existing data points. It is reproducible if lost, but this would be expensive. Examples are data mining, compiled databases, and 3D models.
Reference or canonical data is a static or organic conglomeration or collection of smaller (peer-reviewed) datasets, most probably published and curated. For example, gene sequence databanks, chemical structures, or spatial data portals.