OSCR Members Jannis Stöckel and Sebastian Himmler are PhD candidates at the Erasmus School of Health Policy & Management and study how to efficiently use health care resources. They recently applied open science practices on a collaborative project and I asked them to share their experience with the community:

By now, we think that following open science principles should be the norm and not the exception for research projects. As one of the professors in our department once put it, “we should do better in this regard compared to the previous generation of researchers”, especially now that the tools to make one’s research open are as easily accessible as they are. But besides our support of the idea of open science in general, we believed our topic specifically was cut out for applying them. In this project we estimated the monetary equivalent value of living one year in full health. To most people this might sound like some obscure thing only health economists could care about, but depending on where you live such values and the related considerations can play an important role in guiding reimbursement decisions in publicly funded healthcare. In theory, studies producing such values can have real world impact on the health of whole populations. In our view it is only appropriate that the process that created such values is made as transparent as possible. In addition, we aimed to put a previously applied methodology to the test by using a range of specification choices in order to guide future researchers in the application of this method. It was therefore logical for us to provide the exact code we used to avoid any ambiguity for future applications based on our recommendations.
We did not plan to make this project adhere to good open science principles from the start, as we only got more interested in these practices after the start of the project. However, as the work was equally shared between the two of us, we already had to write structured, readable, and reproducible code, from the first draft analyses on. While sometimes cumbersome, we definitely improved our skills in the process due to deliberate practice. The whole project took almost three years (!) from first idea to publication, with large stretches in between in which only one or none of us was working on it, leading to multiple times at which we had to get back to code/documents we wrote months before. The solid basis with respect to structuring and annotating our work made this quite easy and allowed us to gradually improve the documentation. Besides publishing the full code of our analysis on the OSF, we were not able to provide the data directly because researchers need to apply for the data from a third party (access is free of charge though). As the dataset structure is fairly complex – being a general population panel survey spanning 16 years –, we had to make sure that our documentation contained all steps that reproduce the entire workflow from raw data (once received from the data holder) to published results. However, thanks to the consistent work we did along the way writing up a clear set of instructions came about naturally due to the structure of our script files and analyses.
The biggest benefit to ourselves from all this is clearly that, in contrast to previous projects, any questions coming up about the analysis could be answered very efficiently, while researchers wishing to extend upon our work would have a comprehensive starting point. And even if not directly used for the purpose of extending our work, it might already be helpful for researchers in different ways. As graduate students, we would have loved to see how complex datasets turn into results, but rarely can you find the entire workflow being available. In addition, we would hope this project can work as a template for future users of the same data to lower the threshold to make their project as transparent as possible. On the more personal side, the code can now be cited as well, and for future reference it is easy to showcase the skills we gained by working on this by simply providing the final product, while future projects using similar data can now more easily be started. Especially given the positive support of our supervisors, we are confident that, in time, familiarity with open science methods will be a thought-after skill both in academia and beyond. We also learned that most data providers are very positive about sharing code and doing so in the format the researchers see fit to lower the barriers.
Overall, going the extra mile to make this project open science has paid off and was worth the additional effort that was necessary. However, there is more to do. Providing transparency with respect to published results is great but all those little decisions we took along the way are not reported. For our next project we hope to also provide some transparency with respect to those choices in the research process that normally go undocumented, for example by finally switching to R and using GitHub to make changes better traceable.

Jannis and Sebastian highlight the research and educational benefits for others and for themselves, an aspect of open science practices that is often discussed but not fully appreciated until researchers apply them in their own projects.

We hope that Jannis and Sebastian’s experience will inspire more researchers to openly share their analysis procedures. If you are affiliated with any school at Erasmus University, contact Antonio for personalized assistance. Also, consider joining OSCR to crowdsource knowledge from our members!

Jannis Stöckel, Sebastian Himmler, and Antonio Schettino