Open Science Concerns addressed during the ROARR2021 July session
The last session of the ROARR2021 programme was centered around questions provided by attendees. It was an informal and lively discussion that touched upon several aspects of open and responsible research, so we thought that it would be useful to provide a brief summary to interested people who could not attend the event. We are grateful to Nikita Setiaman and Milan Zarchev who, during the meeting, took the notes that are included in this blog post (in italics). We also provide additional information that we feel is relevant but could not be mentioned during the meeting due to lack of time.
Is open research always better than closed research?
There are multiple forms of open research. For example, open data has clear benefits for replication purposes or to address a wider range of research questions. Despite these benefits, there are factors like privacy which should take precedence over open data. Options are available to find a balance between openness and privacy. Moreover, it is a time-consuming process to make sure data and materials are shared in a responsible manner. Institutions might not always be willing to spend resources training staff on the issue of responsible open science.
Closed research is not necessarily worse than open research, but its verifiability is, by definition, lower. Here we highlight two examples of closed research that would have benefited from opening up at early stages:
Researchers in genetics often share supplementary files with the list of genes described in their papers, for reusability and verification. They are usually spreadsheet created in Microsoft Excel. Unfortunately, default settings in Excel automatically convert gene names to dates and floating-point numbers. For example, SEPT2 (Septin 2) is converted to ‘2-Sep’. This issue was discovered in 2004, but is still highly prevalent today. In a recent study, the authors scanned 35,175 supplementary files, identified 7,467 gene lists, and found these errors in 19.6 % of them. Interestingly, they also found a correlation between these errors and journal impact factor, with Nature having the highest percentage of errors at around 30%.
Two economists, Carmen M. Reinhart and Kenneth S. Rogoff, became very famous in 2010 after publishing a paper entitled Growth in a Time of Debt. They argued that countries with debt larger than 90% of their gross domestic product enter into a state of recession, leading to a drop in economic growth. This led some politicians to promote austerity measures rather than stimulate the economy in their country. However, other economists were skeptical. First, there were unwarranted claims of causality: do high debts cause low growth, or vice versa? Also, similar data collected by other researchers did not show the same trend. The community asked for the analysis spreadsheet, which Reinhart and Rogoff shared in 2016. Upon inspection, it was found that some analysis choices were questionable and some data were omitted in the calculations due to a coding error. Corrected results do not show this purported 90% threshold. A summary of this controversy can be found on the New Yorker.
Having said that, there are sensitive research topics for which protection of participants’ privacy is paramount, for example studies on human trafficking. However, depending on the nature of the data and country legislation, it is still possible to share these datasets when properly anonymized. An example is the Global Data Hub on Human Trafficking, which contains anonymized data from 108,000 individual cases of human trafficking that can be freely downloaded for inspection and reuse.
How do researchers engage with open data? Do they see value in and conduct their own research (re)using open data?
The difference is in flexibility – own collected data provides a much higher sense of control over how the data can answer the research question at hand. Secondary data only allows the use of whatever the researchers originally decided. This flexibility comes at a price – own collected data is typically much more limited in scope, resources, and participants, and therefore power. Large public datasets are typically powered to a more than adequate level. Finally, there is the added benefit of having other people work on your data, meaning they could double check your analyses and results.
How do you feel about preregistering analyses when you already have the data (secondary data)?
There are templates that are hugely helpful in guiding you through what should be considered before conducting the analysis of data you already have, e.g., taking into account knowledge of the data you might already have through data exploration or otherwise. If you have information about a certain measure that affects decisions for your paper, you have to make a statement, so you are still transparent. A template for preregistration of secondary data can be found at https://osf.io/v4z3x/.
What if your data is skewed after analyses, but you did not know it during preregistration?
There are decision trees that can help you plan steps in case you find that your data is skewed (some comprehensive preregistration templates may help in that regard). If your preregistration does not include this option, you can still adjust your analyses and transparently report such deviations from the preregistered plan. Preregistration is about increasing transparency.
What if my research team does not care about open science and would prefer I do not waste my time on it?
One enticing angle to capture the attention of skeptical researchers is to discuss how more and more funding agencies have policies mandating or encouraging open research practices. Engaging in open research actually provides a competitive edge to new research ideas. This is one example of how open science can be related to incentives that researchers already care deeply about (e.g., funding, publications). There are ongoing debates around what would be the appropriate response to categorical refusal to open research practices. Some say leaving is a perfectly valid option, others point out that, to avoid a biased selection of researchers in science, one should stay until given the chance to change things for the better. Some tips and tricks to convince your supervisors have been discussed in this Twitter thread and this preprint.
Several funding agencies require commitment to open access and open data. One prime example is the recently launched Horizon Europe programme, whose guidelines include a 16-page long section on open science. It clarifies mandatory and recommended open science practices that researchers must take into account during project submission.
Some funding agencies even freeze funding during the project if policies are not followed. An example is the Wellcome Trust foundation in the UK. In their open access policy, it is stated that non-compliance may lead to:
- no formal notification of funding renewals or new grants
- no acceptance of new grant applications
- if several researchers from the same institution fail to comply, Wellcome will work together with the institution to find a solution. If the institution does not cooperate, grant payments towards it can be suspended
How can I protect myself from supervisor’s requests to conduct questionable research practices (QRPs)?
If supervisors want to change analyses, add more covariates post-hoc, or engage in other QRPs, preregistration may be one solution to mitigate the adverse effects. Any deviations from the initial plan need to be justified and reported.
Elisabet Blok, Bing Xu, Lorenza Dall’Aglio, and Antonio Schettino