New project, partly designed by a University of Cambridge researcher, aims to improve transparency in science by sharing ‘how the sausage is made’. 

Having the code means that others have a better chance of replicating your work.

Stephen Eglen

A new pilot project, designed by a Cambridge researcher and supported by the Nature family of journals, will evaluate the value of sharing the code behind published research.

For years, scientists have discussed whether and how to share data from painstaking research and costly experiments. Some are further along in their efforts toward ‘open science’ than others: fields such as astronomy and oceanography, for example, involve such expensive and large-scale equipment and logistical challenges to data collection that collaboration among institutions has become the norm.

Recently, academic journals, including several Nature journals, are turning their attention to another aspect of the research process: computer programming code. Code is becoming increasingly important in research because scientists are often writing their own computer programs to interpret their data, rather than using commercial software packages. Some journals now include scientific data and code as part of the peer-review process.

Now, in a commentary published in the journal Nature Neuroscience, a group of researchers from the UK, Europe and the United States have argued that the sharing of code should be part of the peer-review process. In a separate editorial, the journal has announced a pilot project to ask future authors to make their code available for review.

Code is an important part of the research process, and often the only definitive account of how data were processed. “Methods are now so complex that they are difficult to describe concisely in the limited ‘methods’ section of a paper,” said Dr Stephen Eglen from Cambridge’s Department of Applied Mathematics and Theoretical Physics, and the paper’s lead author. “And having the code means that others have a better chance of replicating your work, and so should add confidence.”

Making the programs behind the research accessible allows other scientists to test the code and reproduce the computations in an experiment — in other words, to reproduce results and solidify findings. It’s the “how the sausage is made” part of research, said co-author Ben Marwick, from the University of Washington. It also allows the code to be used by other researchers in new studies, making it easier for scientists to build on the work of their colleagues.

“What we’re missing is the convention of sharing code or the tools for turning data into useful discoveries or information,” said Marwick. “Researchers say it’s great to have the data available in a paper — increasingly raw data are available in supplementary files or specialised online repositories — but the code for performing the clever analyses in between the raw data and the published figures and tables are still inaccessible.”

Other Nature Research journals, such as Nature Methods and Nature Biotechnology, provide for code review as part of the article evaluation process. Since 2014, the company has encouraged writers to make their code available upon request.

The Nature Neuroscience pilot focuses on three elements: whether the code supporting an author’s main claims is publicly accessible; whether the code functions without mistakes; and whether it produces the results cited. At the moment this is a pilot project to which authors can opt in. It may be that in future it becomes mandatory and only when the code has been reviewed will a paper then be accepted.

“This extra step in the peer review process is to encourage ‘replication’ of results, and therefore help reduce the ‘replication crisis’,” said Eglen. “It also means that readers can understand more fully what authors have done.”

An open science approach to sharing code is not without its critics, as well as scientists who raise legal and ethical questions about the repercussions. How do researchers get proper credit for the code they share? How should code be cited in the scholarly literature? How will it count toward tenure and promotion applications? How is sharing code compatible with patents and commercialization of software technology?

“We hope that when people do not share code it might be seen as ‘having something to hide,’ although people may regard the code as ‘theirs’ and their IP, rather than something to be shared,” said Eglen. “Nowadays, we believe the final paper is the ultimate representation of a piece of research, but actually the final paper is just an advert for the scholarship, which here is the computer code to solve a particular task. By sharing the code, we actually get the most useful part of the scholarship, rather than the paper, which is just the author’s ‘gloss’ on the work they have done.”

Adapted from a University of Washington press release


Creative Commons License
The text in this work is licensed under a Creative Commons Attribution 4.0 International License. For image use please see separate credits above.