For Authors: How and Why to Cite Data

Why Cite Data?

Citing data—the same way you would sign a research article— has three main functions (adapted from ICPSR)
  1. Making the cited data findable for your readers
  2. Allowing indexing tools such as Web of Science or Scopus to track their use and thus give credit to the data's creators
  3. Allowing funders and researchers to measure impact of data

How to Cite Data

Data citations look like other citations, but they reference data sets, not papers. But it's perfectly fine to reference both a data set and the paper it came from, if appropriate. Here is how data might be cited in a (made-up) study

A (fake) study on lower back pain

To study the neural processing of lower back pain, we analyzed an fMRI data set from individuals exhibiting chronic lower back pain (Vrana et al., 2015a) from the study by Vrana et al. (2015b). And, just to put in a second example from a different repository, we utilized the replication dataset from Cranmer et al., (2016a) to test our network analysis.

References

Cranmer, Skyler; Leifeld, Philip; McClurg, Scott; Rolfe, Meredith, (2016a), "Replication Data for: Navigating the Range of Statistical Tools for Inferential Network Analysis", Harvard Dataverse, http://doi.org/10.7910/DVN/2XP8YF, V1 [UNF:6:agrnQnH86oRB/yOd+p8V4A==]

Vrana A, Hotz-Boendermaker S, Stämpfli P, Hänggi J, Seifritz E, Humphreys BK, Meier ML (2015a) Data from: Differential neural processing during motor imagery of daily activities in chronic low back pain patients. Dryad Digital Repository. http://doi.org/10.5061/dryad.2h0q3

Vrana A, Hotz-Boendermaker S, Stämpfli P, Hänggi J, Seifritz E, Humphreys BK, Meier ML (2015b) Differential neural processing during motor imagery of daily activities in chronic low back pain patients. PLOS ONE 10(11): e0142391. http://doi.org/10.1371/journal.pone.0142391

Anatomy of a Data Citation

Let's take a closer look at a data citation. It looks a lot like a citation to a journal articles with typical elements like authors, date, and title. It also has some features like version number, and, in the example below, a numeric fingerprint that are specific to data citations. Finally, a good data citation should always include a unique, persistent identifier. Most commonly, as in our example, this will be a digital object identifier (DOI), but there are other options, particularly for data in the sciences with a long tradition of repositories. In 2014, Force11 coordinated and published a set of 8 principles for data citation. In the illustration below, we associate these principles with elements of a data citation.

Anatomy of a Data Citation