Research Data Management Glossary

data.bris, 14 May 2013

This Version: (rdf)
Latest Published Version: (rdf)
Dr Virginia Knight, David Boyd, Stephen Gray


A collection of data and/or records stored with a view to long-term preservation.


A copy of data or software which can be used to recover and restore it if the main copy is unusable.

Born digital

Refers to materials that originated in a digital form, rather than those that were derived from a physical form (for example by being scanned or transcribed).

Collection-level description

A set of metadata describing a collection of data as a whole, as opposed to than individual items within it.

Controlled vocabulary

A type of vocabulary scheme which mandates the use of predefined, authorised terms which have been selected by the designer of the vocabulary, in contrast to natural language vocabularies, where there is no restriction on the vocabulary.

Creative Commons licences

A means of managing the terms which are automatically attached to copyrighted works, including research data. The suite of licences offered by the Creative Commons organisation enables and encourages the re-use, sharing and distribution of works. For example, when applied to your work, the ‘By-Attribution, Non-Commercial' licence allows other to use and build upon your data, although their new work must acknowledge you and not be used for commercial purposes.

Curation (or data curation)

The process of actively managing data, from the point of creation through all the stages in its existence, to ensure it is accessible and fit for purpose.

Data cleaning

Deleting or editing corrupt or inaccurate parts of data, in order to achieve data integrity.

Data integrity

The completeness, accuracy and freedom from error of a data set.

Data protection

The appropriate management and processing of data which identifies individuals.

Data Protection Act 1998 (DPA 1998)

A United Kingdom Act of Parliament which defines UK law on the processing of data or information concerning individuals, including the obtaining, holding, use or disclosure of such information. The Act is the main piece of UK legislation which governs individuals' rights over their personal data, and also protects individuals from the misuse of their personal data. The Act also requires that those who handle and process personal data should comply with a number of important principles and legal obligations.

Data repository

See Repository.

Data Steward

An employee of the University who is responsible for the lifecycle of a set of research data, usually the Principal Investigator (PI) of the project.

Data service

An online digital repository relating to a particular discipline or disciplines. In the UK, these are typically funded by one or more Research Councils.

Data sharing

The act of making data available to people other than those who originally produced it or used it for their research'.

Data management plan

A description of how the data produced by a research project will be handled and curated both during and after the project.

Dataset (or data set)

A defined collection of data with common elements.

Deposit (or Data deposit)

The process of committing data to a repository or other storage facility.

Digital Object Identifier (DOI)

A name which provides a means of persistently identifying a digital object, which may be a dataset, and associating it with related current data in a structured extensible way.

Digital preservation

The set of processes, activities and management of digital information over time to ensure its long-term accessibility. Because of the relatively short lifecycle of digital information, preservation is an ongoing process.


A period during which access to research data is not allowed to certain types of users. This is either to protect the revenue of the publisher or (more generally) to protect the interests of other parties (for example, partner research organisations).

Encryption (or Data encryption)

Encoding or other modification of data in order to protect it from unauthorised access and/or changes.

Freedom of Information (FOI)

The right of the general public to access information held by public authorities (including universities). In the UK, FOI is governed by the Freedom of Information Act 2000.

Ingest (or Data ingest)

The transferral of data to a repository, archive or other storage.

Institutional repository

An online locus for collecting, preserving, and disseminating materials such as journal articles and digital versions of theses and dissertations. It might also include other digital resources generated by a research institution such as administrative documents, course notes or learning objects.

Information security

The protection of information and information systems from unauthorised access, use, disclosure, disruption, modification, perusal, inspection, recording or destruction. Within the University of Bristol information is classified as either Public, Open, Confidential, Strictly confidential or Secret.

Intellectual Property Rights (IPR)

The legal rights which individuals and institutions have over their intellectual property, determining who can copy, distribute, adapt, use, or profit from it.


The ability of diverse systems, content and organisations to work together (inter-operate).


A piece of metadata consisting of a short term classifying the content of all or part of some data.

Lifecycle (or Curation lifecycle)

The complete history of a piece of data and its curation, from creation onwards, conceived as a series of steps or stages.


Structured data about data.

Metadata element

A particular property of a piece of data, used as metadata.

Metadata schema

A framework specifying which metadata elements can or should accompany particular types of data, and the meanings of those elements.

Migration (or Digital migration)

The transfer of data from one format or software platform to another.

Open source (antonym: proprietary)

A philosophy which promotes free redistribution of and access to an end product, its design and implementation details.

Open data

The idea that certain data should be freely available to everyone to use and republish as they wish, without restrictions from copyright, patents or other mechanisms of intellectual property control.

Persistent identifier

A name, such as a DOI, which identifies an object unambiguously and permanently.

Preservation (or Digital preservation)

The process of ensuring that digital files are available and comprehensible in the long term.

Repository (or Digital repository)

A facility where data may be deposited and preserved safely, with regulated access to it.

Research data

Data, or units of information which are created in the course of funded or unfunded research, and often arranged or formatted in a such a way as to make them suitable for communication, interpretation, and processing, perhaps by a computer. Examples of research data may include a spreadsheet of statistics, a series of email messages, a sound recording of an interview, a descriptive record of a rock specimen, or a collection of digital images. Research data does not include data generated in the course of personal activities, desktop or mailbox backups, or data produced by non-research activities such as University administration and teaching.

In other words: Research data is digital information created in the course of research but which isn't a published research output (see below). Research data excludes purely administrative records. The highest priority research data is that which underpins a research output.

Research information

Administrative information relating to the research process. Examples include staff profiles and financial information.

Research output

According to the RAE, assessable research output contains an element of innovation, contributes to scholarship, is publicly accessible and is generalisable. Research output is not limited to texts.


The process of ensuring that digital outputs remain publicly accessible and usable beyond the end of funding.


A non-hierarchical and non-taxonomic keyword which may be included in the metadata for a piece of data or dataset.


The management of different versions of data or software, for example by a naming scheme.


This glossary was produced by the data.bris project at the University of Bristol, part-funded by JISC as part of its Research Data Management programme. The authors would like to acknowledge the following prior work on Research Data Management which has informed and influenced this glossary.