For years, scientists have discussed whether and how to share data from painstaking research and costly experiments. Some are further along in their efforts toward “open science” than others: Fields such as astronomy and oceanography, for example, involve such expensive and large-scale equipment and logistical challenges to data collection that collaboration among institutions has become the norm.
Meanwhile, a variety of academic journals, including several in the Nature Research family, are turning their attention to another aspect of the research process: computer programming code. Code is becoming increasingly important in research because scientists are often writing their own computer programs to interpret their data, rather than using commercial software packages. Some journals now include scientific data and code as part of the peer-review process.
And now, with the May 25 online publication of a by , 91探花 associate professor of anthropology, and 13 other colleagues at universities across the United States and Europe, there are conventions and tools that researchers can use to make code sharing easier and more efficient. The team鈥檚 paper advocating the sharing of code appears in Nature Neuroscience, while the journal in an editorial 聽a pilot project to ask future authors to make their code available for review.
Making the programs behind the research accessible allows other scientists to test the code and reproduce the computations in an experiment 鈥 in other words, to reproduce results and solidify findings. It’s the “how the sausage is made” part of research, Marwick said. It also allows the code to be used by other researchers in new studies, making it easier for scientists to build on the work of their colleagues.
鈥淲hat we鈥檙e missing is the convention of sharing code or the tools for turning data into useful discoveries or information,” Marwick said. “Researchers say it鈥檚 great to have the data available in a paper 鈥 increasingly raw data are available in supplementary files or specialized online repositories 鈥 but the code for performing the clever analyses in between the raw data and the published figures and tables are still inaccessible.鈥
Other Nature Research journals, such as and provide for code review as part of the article evaluation process. Since 2014, the company has encouraged writers to make their code available upon request.
The Nature Neuroscience pilot focuses on three elements: whether the code supporting an author’s main claims is publicly accessible; whether the code functions without mistakes; and whether it produces the results cited.
鈥淭his is a commitment from a high-impact journal to raise software to the status of a regular research product, that it鈥檚 not just a tool that gets discarded along the way, or hidden on a researcher鈥檚 computer where no-one else can benefit from it,鈥 Marwick said. 鈥淚n the future, scientific disciplines will be shifting to a position where you need to share your code as well as your data. It will be easier to reproduce someone鈥檚 new discovery, and incorporate their discoveries into your own work.鈥
Imagine this scenario, Marwick said: A neuroscientist is trying to find new ways to identify early-stage tumors using 3-D brain imagery. She comes up with an algorithm that can pick out specific pixel values in an image, which helps lead to early tumor detection. By sharing the computer code and its mathematical algorithm, the scientist could facilitate a breakthrough.
The Nature Neuroscience paper resulted from a two-day workshop held in 2014 in the United Kingdom, to Marwick, an archaeologist, was invited because of his efforts in using code and promoting open science in archaeology. A Senior Data Science Fellow at the 91探花eScience Institute, Marwick is active in the institute’s Reproducibility and Open Science Group, which works on issues and practices around tools and practices to enhance data sharing, preservation and reproducibility.
, associate director of the eScience Institute, said code sharing is part of the future. “Reproducibility is literally the definition of science, and as science moves from the lab to the computer, code sharing must be at the core of how we conduct research and train students.”
An open science approach to sharing code is not without its critics, as well as scientists who raise legal and ethical questions about the repercussions. How do researchers get proper credit for the code they share? How should code be cited in the scholarly literature? How will it count toward tenure and promotion applications? How is sharing code compatible with patents and commercialization of software technology?
Marwick, who specializes in prehistoric human evolutionary ecology in Southeast Asia and Australia, has been advocating for code-sharing and related open science initiatives in archaeology through the Society of American Archaeology.
“I’m just trying to shift the needle in my discipline to a practice that benefits everyone 鈥 researchers and the public,” he said.
###