Quality standards in the sciences have recently been heavily criticised in the academic community and the mass media. Scandals involving fraud, errors or misconduct have stirred a debate on reproducibility that calls for fundamental changes in the way research is done. As a new teaching course at Cambridge shows, the best way to bring about change is to start in the classroom, explains course instructor Nicole Janz.

Reproducibility is not only a challenge for the academic world, but has a direct impact on society.

Nicole Janz

Reproducibility is held as the gold standard for scientific research. The legitimacy of any published work depends on the question: can we replicate the analysis and come to the same results?

However, scientific practice is far from this ideal. A recent study of reproducibility in political science found that only 18 of 120 journals have a replication policy which requires the authors to upload their datasets so that others can check the results. In economics, an analysis of nearly 500 webpages of scholars showed that the vast majority does not give access to their data and software code on their site.

The reasons for this lack of reproducibility are simple. For researchers, being transparent means investing time into keeping detailed logs of data collection, variable coding, and all models used for the analysis. But can academics realistically make time for this? Due to heavy teaching commitments and the pressure to produce and publish new work there is little incentive to spend more time on recording research steps, keeping a detailed filing system and making data accessible. Therefore, authors of published work often do not even remember themselves how they got their results.

As a consequence, much of the knowledge we trust today remains unchecked.

One of the largest scandals reported on last year demonstrated the impact of a lack of transparency. In the economic paper “Growth in a Time of Debt” by Carmen Reinhart and Kenneth Rogoff, faulty results (due to a spreadsheet error in the data) led to the belief that high levels of government debt are bad for economic growth. The paper was heavily relied on by politicians to introduce austerity measures. It took three years for the errors to be uncovered by another study, conducted by a graduate student, because the data were not available.

This example shows that reproducibility is not only a challenge for the academic world, but has a direct impact on society.

The blog Retraction Watch now nearly daily uncovers misconduct, errors and fraud in academic journals in all disciplines, while the mass media have started reporting about irreproducibility in scientific research.

The debate has prompted new initiatives. A group of psychologists launched a project at the Center for Open Science in the US to make studies radically more transparent. A similar initiative, the “Reproducibility Project: Cancer Biology”, is now replicating the top 50 most impactful cancer biology studies published between 2010-2012. The American Political Science Association has revised their guidelines for ethical political science research and recommends all journals to require more transparency from their authors.

However, a real change towards more transparency in research must start much earlier – at the student level.

This is the main goal of the Cambridge Replication Workshop at the Social Sciences Research Methods Centre. In eight weeks, students learn about reproducibility standards and then re-analyse a published paper in their field.

The first part of the course introduces students to reproducibility challenges. They discuss what reproducibility means and learn about current cases of failed research transparency and consequences for the scientific community. They then discuss how to make their own doctoral work reproducible. For example, they learn which software is best for reproducible research, and how to set up a clear structure of files and folders that contain logs for analysis and data transformations so that they can always track back how they made their research decisions years later. Students also discuss why it is in their own interest to publish their materials in a data repository like the University of Cambridge’s repository DSpace @ Cambridge.

In the practical part of the course, students then pick a recently published article in their field and try to replicate the results. Replication involves downloading the original paper, finding the data and possibly software code, corresponding with the author, and finally publishing their replication study in the workshop’s data archive to make it available to the wider community.

This is when it hurts.

By trying to replicate existing work, students learn first-hand what irreproducibility really means. Students were confronted with the following challenges: (1) data were nowhere to find, (2) the author did not respond to queries for data, (3) the authors did not remember where they stored their files, (4) methods were not clearly described, (5) it was not clear how raw data were transformed, and (6) statistical models remained opaque.

This irreproducibility across all fields led to extreme frustration among students – and it demonstrated consequences of lack of transparency. Even the experienced Teaching Assistants were surprised at the challenges students had to face.

This is not to say that the Replication Workshop is an exercise in frustration.

In student feedback, many reported that they learned much more about statistical methods than in any standard statistics course. They also got experience in how authors make decisions about the analysis that never make it into the polished versions of published work. In feedback, one student wrote that the course “taught me so much about how to publish legitimate and correct research. I cannot wait to apply my knowledge from this course to other projects."

One student will present his results at the International Studies Association Annual Convention this year, while others plan to embed their experience as a pilot study in their PhD. Several students are hoping to publish their replication study as the first article in their academic career.

With the Replication Workshop, Cambridge is one of the first universities to combine practical replication with learning about reproducibility standards for graduate students. Only when more universities nurture a reproducibility and replication culture in their teaching, can we ensure that the gold standard of reliable, credible and valid results is upheld.

Nicole Janz teaches research methods at the Social Sciences Research Methods Centre. She writes about reproducibility and data sharing on her academic blog Political Science Replication and comments on twitter (@polscireplicate).

Cambridge Replication Workshop

Nicole Janz developed the Replication Workshop in 2012/13 as a new way to teach statistics and reproducibility at the same time. The course is open to graduate students, and it is funded by the Social Sciences Research Methods Centre. All class materials (syllabus, handouts, assignments, students’ final assignments) are freely available here: Dataverse.

This work is licensed under a Creative Commons Licence. If you use this content on your site please link back to this page.