5 Further Reading

Here are some extra resources researched and shared by the class.


To add a section, please:

  1. Go to the repository for this web-book on GitHub.
  2. Click on the “Fork” button towards the upper right to make your own copy of the repository in your own GitHub profile.
  3. In your copy of the repository on GitHub (online), open “06-futherreading.Rmd”.
  4. Copy (don’t just edit!) the example text* between the lines below, paste it below the lines, and replace with your link, name, text etc. You don’t have to add your name if you don’t want to, as long as I know your Git username.
  5. Save and add a commit message along the lines of “Added new resource under further reading”
  6. Next you want to perform a “pull request”, which will send a request to me to pull your new code into the main (my) version of the repository.
  • From your Github repo (not mine!), click on Pull requests (top left) and hit the green New Pull Request button.
  • Follow the instructions, creating a title, message, and confirming that you want to create the pull request (you may be asked to confirm a couple of times).

That should be it!

*Note: For the example below, I copied, pasted and minimally edited text directly from the resource. This is ok, because they shared the resource under an MIT licence and I’m acknowledging them, but for this exercise I would like you to use your own words as far as possible.


5.1 library(osfr) for interacting with the Open Science Framework (OSF) from R

library(osfr) is an R package that allows you to interact with OSF directly from R. OSF is a free and open source project management repository designed to support researchers across their entire project lifecycle. The service includes unlimited cloud storage and file version history, providing a centralized location for all your research materials that can be kept private, shared with select collaborators, or made publicly available with citable DOIs. You can use library(osfr) to explore publicly accessible projects and download the associated files — all you need to get started is the project’s URL or GUID (global unique identifier). library(osfr) can also be used for project management by creating projects, add sub-components or directories, and uploading files.

Contributor: Jasper


5.2 Webinar about irreproducibility in Ecology

This seminar highlights the replicability crisis, why replicability is specifically so important in ecology and explains some ways that they are applied through the presenters studies. There is a widespread issue of irreplicability in science across fields, due to questionable research practices, cherry picking, only showing desirable results etc. as well as the large problem with academic culture. Many scientists in ecology would rather publish ‘original’ work and find it hard to find funding for replication studies, and hence have decreasing the number of replication studies done. The presenter goes on to explain these issues in more depth, with examples and finally how, and with what studies she is planning on addressing the issue of irreplaceability in Ecological sciences. Link to video: https://youtu.be/3hnYSTXkoJA?si=BKQ1Yk3LgEtELlvN

Contributor: Meghan M


5.3 Ivimey-Cook et al. (2025) “From policy to practice: progress towards data- and code-sharing in ecology and evolution”

Code and data sharing are a key component in creating reproducible research. However, in ecology and evolution journals, compliance with policy regarding code and data sharing is poor. This paper investigated data and code sharing policies and compliance for 275 journals in ecology and evolution. The study found that only 22.5% encouraged data sharing and only 38.2% made data sharing mandatory. Similarly, only 26.6% and 26.9% had encouraged and mandatory code sharing, respectively. This paper also provides seven recommendations for journals on how to improve data and code sharing policies and compliance. Recommendations included: policies must be findable and clear, code sharing must be mandatory, policies must be applicable to all types of data, and journals should have data editors to assess data and code quality. This paper is interesting as it looks at the role of policy in promoting reproducible research and how journals can more effectively create and uphold policies.

https://doi.org/10.1098/rspb.2025.1394

Contributor: Hannah


5.4 Multi-Stakeholder Research Data Management Training as a Tool to Improve the Quality, Integrity, Reliability and Reproducibility of Research

This paper focuses on something very important for data management and reproducible research: it evaluates a training program designed to enhance research data management (RDM) competencies among doctoral students and postdoctoral researchers. The program was quite interesting because it was collaborative, bringing together multiple stakeholders. It was also at a generic level, which means it was designed not to be overly specific to one discipline but rather relevant across a range of disciplines. Their finding was essentially that training can improve understanding of RDM practices. Feedback and competency ratings from participants in the program indicated that “key RDM practices, such as data planning, documentation, and organization, were better understood after the training.”

https://liberquarterly.eu/article/view/11726/13897

Contributor: Ayanda


5.5 Lisa DeBruine and Dale Barr (2021) “Data Skills for Reproducible Research”

Producing reproducible research requires learning certain skills and tools which allow for efficiently managing your data and record details of your analysis. DeBruine and Barr first show how to create a project in R Studio and the benefits for your reproducible workflow in doing so as you have a clear filing system which makes reading in data easy, for example. Secondly, an introductory tutorial on R Markdown is covered including how to embed exploratory visual analyses, such as figures, and tables, and preliminary analysis results in your R Markdown script. Finally, citing statistical packages is covered, which is essential in making the process of your data analysis more transparent and easier to replicate.

Zenodo DOI: 10.5281/zenodo.6527195, or see https://psyteachr.github.io/reprores-v2/repro.html for more.

Contributor: Ash S


5.6 Reproducibility trial: 246 biologists get different results from same data sets

In this article by Anil Oza, they speak on a paper doing research on the reproducability of ecological data. The paper (which is under peer review) showed that 246 different scientist got widely different results on the same research question and data set. This gives rise to their question of what can be trusted as a true and proper analysis with a correct result. Most of it boils down to not having a structure in place of how to analyse the data provided and different researchers having their own way to analyse the data. Having power-analyses to find the correct sample sizes and other standardised ways of reviewing the data that is not part of our field necessarily, like robustness tests used in economics, is a part of their solution. This article and paper further cements the importance of reproducability and its practices.

Link to article: https://www-nature-com.ezproxy.uct.ac.za/articles/d41586-023-03177-1

Contributor: Judi


5.7 Conference Talk on Building Reproducible and Replicable Projects

This YouTube video shows a talk by Dan Chen at the New York 2019 Conference. He gives a very short but concise introduction into reproducibility vs replicability. More importantly he also talks about creating files in such a way to aid in reproducibility (Key Moment: File Inputs), this will also aid you in keeping track of your data. He also gives some useful advice on how to draw out your scripts in a concise way (Key Moment: Draw your build scripts). His talk highlights that reproducible data is not only good scientific practice but creates an environment that is easier for you to manage in the long term.

Contributor: Sarah Frances Visser


5.8 British Ecological Society (2014) “A Guide to Data Management in Ecology and Evolution”

The Guide to Data Management in Ecology and Evolution is a comprehensive resource for early researchers and focuses on the importance of data management in producing high-quality, easily accessible, and reusable research data. It explains what data and data management is and why it is important as well as provides advice and examples of good data management practices. It covers the various stages of the data lifecycle, including planning, creating, processing, documenting, preserving, sharing, and reusing data. The guide emphasizes the growing importance of collaborative and interdisciplinary research, highlighting the British Ecological Society’s (BES) efforts in promoting data archiving and accessibility. It also addresses challenges and best practices in data collection, digitization, processing, documentation, preservation, and sharing of data. The guide also addresses ethical aspects, such as intellectual property rights, licenses, and permissions related to the reuse of data. It provides practical advice on file naming, assuring high-quality data, processing the data, version control, documenting the data, creating metadata, using non-generic file formats, and organizing data effectively.

See: https://www.britishecologicalsociety.org/wp-content/uploads/Publ_Data-Management-Booklet.pdf

Contributor: Melissa Malan


5.9 The British Ecological Society (2017) “A Guide to Reproducible Code in Ecology and Evolution”

The guide, created by the British Ecological society, is a fully comprehensive practical guide on achieving reproducibility in ecological and evolutionary research. The guide covers a wide range of topics, from the initial steps needed to create a project workflow, to organising projects, programming, version control and report writing. This guide makes use of clear and easy to follow instructions, “do’s and don’t’s” and examples of code, naming files or documents, citations etc. These recommendations and tools are tailored to specific challenges and contexts of data analysis and report writing within ecological and evolutionary research. Where necessary, clear and concise definitions are provided for the reader. These help to unpack the foreign and often confusing terms that come with learning new software like Git. This guide is particularly helpful in explaining how different packages can be beneficial in streamlining the entire report writing process. For example, finely controlling how figures are generated and placed within your report using the package “knitr”.

See https://www.britishecologicalsociety.org/wp-content/uploads/2017/12/guide-to-reproducible-code.pdf for more.

Contributor: Robyn


5.10 Six factors affecting reproducibility in life science research and how to handle them

https://www.nature.com/articles/d42473-019-00004-y

The advertisement feature gives a good overview of the topic of reproducibility. It starts by outlining what reproducibility is, and how it has been problematic in science since the majority of published papers are currently not reproducible. The article then explains several factors, that cause the lack of reproducibility. These include the inaccessibility of methodological details, the use of misidentified of contaminated samples in microbiology, the difficulty many researchers have with managing complex datasets, bad experimental designs and research practices, cognitive bias, and the pressure to publish novel and extraordinary findings. The article then goes on to recommend ways that help in producing reproducible research, and outlines the work of several initiatives, that address and support the issues in research reproducibility.

Contributor: Svenja


5.11 Braga et al. (2023) “Not just for programmers: How GitHub can accelerate collaborative and reproducible research in ecology and evolution. Methods in Ecology and Evolution”

https://doi.org/10.1111/2041-210X.14108

This article provides practical guidance for researchers in the field Ecology and Evolutionary biology on how to use GitHub. In the article, GitHub is described as “the most used web-based platform which has features for more collaborative, transparent, and reproducible research”. Furthermore, the article outlines features from low to high technical difficulties one might experience with using GitHub including, code and coding management, writing a manuscript, conducting peer review, using automated and continuous integration to streamline analyses, which allows for early detection and correction of errors, potentially improving confidence in scientific development. In addition, the article provides definitions for basic concepts of Git (such as commit, push, pull, fork, repository etc), for users who are less familiar with software terms. The article also provides relevant information like how GitHub limits committed file sizes to 100 megabytes, meaning that a person may require external file storage alternatives. Furthermore, how long-term management of laboratory materials tool such as notebooks etc can also be done within GitHub. Finally, how GitHub also allows for centralized communication for all project-related conversation and planning for a team, rather than emailing, which makes it easier to keep track of progress throughout the lifespan of a project.

Open Science Framework repository (accessible at https://osf.io/bypfm/; Braga et al., 2023) and GitHub repository (accessible at https://github.com/SORTEE-Github-Hackathon/manuscript) for this study.

Contributor: Ongezile Mpehle


5.12 Grames, E.M. and Elphick, C.S., 2020. Use of study design principles would increase the reproducibility of reviews in conservation biology. Biological Conservation

https://doi.org/10.1016/j.biocon.2019.108385

This scientific article highlights the dangers of biological reviews if not conducted in a reproducible way. Biological reviews are very useful when researching to gain an overview of the topic and the trends seen worldwide. However, these reviews need to be taken with a pinch of salt as one should conduct a thorough assessment of the methods before taking the review into account. Grames and Elphick (2020) suggest five steps to conduct a review to have the same reproducible standard that experiments and field studies should have. The first step is “searching for studies” and most reviews they found do not explain how they found the studies used in the review. This highlights the potential for selection bias based on familiarity, access, and time when the search was done. Trying to repeat a review without the full methods of how the search was conducted would be impossible. The second step is “selecting studies for inclusion” as reviews must define their selection criteria. Without defining these criteria clearly, selection bias can occur based on familiarity or choosing studies that agree with the result that the reviewer wishes to achieve. The third step is “critically appraising the quality of included studies” as different studies are carried out to a different standard and thus have varying strength of evidence. The fourth step is “coding studies and extracting data”. Variables need to be defined, just like in regular studies. The fifth step is “synthesizing results” and many reviews do not explain how the results are synthesized leading these reviews to be irreproducible. This paper ends off by emphasizing the need for biological reviews to have a standardised format to ensure reproducibility.

I chose this paper as I have read and used many reviews in projects and research tasks. It makes me question their methods and how one must be careful when using a review paper. They are still valuable but need to be well scrutinised.

Constributor: Hannah Glanvill


5.13 Kellie, D. and Westgate, M. (2023) Improving code reproducibility: Small steps with big impacts, Research Communities by Springer Nature.

https://communities.springernature.com/posts/improving-code-reproducibility-small-steps-with-big-impacts (Accessed: 12 February 2024).

This is a short blog about reproducible research. It mentions how despite the push to make data openly available, these studies are still not reproducible. Despite tools being developed to aid improving reproducibility, the authors argue that alongside an academics busy schedule, it doesn’t seem realistic to go through the time and effort to learn these tools. Subsequently, the authors find it frustrating how learning these tools are often the solution provided by papers to aid reproducibility. As creators of such tools to aid the reproducibility of the Atlas of Living Australia, the authors have learnt much about code reproducibility and share three achievable steps to increase reproducibility: have a shareable work environment (i.e. use code and repositories); ensure you code and notes are understandable and lastly, save your code (including wrangling steps) as a document, preferably with Quarto. The latter for sharing, efficiency and if code breaks.

This blog raised the relevant issue of not knowing where to start to improve reproducibility. There are many suggestions before, during and after data collection, with many aided with complex tools. This can quickly become overwhelming, causing one to avoid reproducibility altogether. To avoid this, three straightforward steps within the coding process were presented, offering a practical starting point to enhancing reproducibility. Additionally, this blog being published in Research Communities, a forum for people interested in research, enhances its importance. Raising awareness to research-orientated people will help to create a culture of reproducibility.

Contributor: Lauren Schoeman


5.14 Powers et al. (2019) “Open science, reproducibility and transparency in ecology”

https://esajournals.onlinelibrary.wiley.com/doi/full/10.1002/eap.1822

Powers et al. (2019) outline “open science” as the best standard both for individual researchers and broader science. Open science they define as the sharing of code, data and metadata and continuous public communication, which roughly coincides with the definition of reproducible research in other papers (e.g. Peng 2011) although Powers et al. (2019) stresses the use of social media and online forums. This allows “peer review on the fly” they argue, which makes the final study more rigorous than it would otherwise be. They note the difficulties with reproducing ecological studies and the hesitancy to publicize the location of threatened species but note that the benefits outweigh the costs. These general benefits are: faster collaborations and sharing of ideas, avoiding time-wasting exercises and building public trust in governmental research agencies. Personal benefits include high citations and there is a strong tendency for the foremost scientists in any field to be the most prolific data sharers.

Overall, Powers et al. (2019) is similar to other papers on the subject, although they are the first I have seen to emphasize public feedback during the research. This has potential to be very useful but premature exposure to the public runs the risk of circumventing the peer review process. Care should be taken to ensure continuous feedback is from appropriate sources.

References:

  • Peng, R.D., 2011. Reproducible research in computational science. Science, 334(6060), pp.1226-1227.
  • Powers, S.M. and Hampton, S.E., 2019. Open science, reproducibility, and transparency in ecology. Ecological applications, 29(1), p.e01822.

Contributor: Ben W


5.15 Zuduo Zheng (2021) “Reasons, challenges, and some tools for doing reproducible transportation research”

https://doi.org/10.1016/j.commtr.2021.100004

This paper outlines the context for reproducible research, and many of the challenges we have already discussed. However, it also outlines some key benfits to both the researcher and the readers. Zheng states that making research reproducible can elevate your reputation as an academic, and often comes with more citations. Once in the habit of working reproducibly, it also helps you work efficiently while reducing the probability of making critical mistakes. Benefits to the reader are that they can trust the research, it is easy to reproduce the analysis and this creates a more productive work flow.

Contributor: Mia S


5.16 TED Talk about Reproducibility Crisis

For those who grasp concepts better with videos - this TED Talk goes over the Baker (2016) paper and the reproducibility crisis.

https://www.youtube.com/watch?v=FpCrY7x5nEE&t=190s&ab_channel=TED-Ed

Contributor: Emm


5.17 S. M. Powers, and S. E. Hampton. 2019. Open science, reproducibility, and transparency in ecology. Ecological Applications 29(1):e01822. 10.1002/eap.1822

The paper discusses the challenge ecologists face in replicating observational studies due to the ever-changing natural world and evolving perceptions. While perfect replication may be unattainable, achieving computational reproducibility—generating equivalent analytical outcomes using the same data and code—promotes transparency, crucial for maintaining public trust. Open data facilitates transparency and reproducibility, though rapid scientific progress is often the primary incentive for open science. Making data and code openly accessible allows scientists to build upon previous research and address new questions. The adoption of open research practices is increasing, often driven by ethical, moral, or practical considerations. It’s anticipated that more journals will encourage or mandate code archiving in their data policies. The paper emphasises the diminishing excuses for not sharing data and code in publicly accessible archives. As more ecologists engage in open science, tools supporting this transition, such as data management, analysis, and visualisation methods, are proliferating and becoming central to ecological culture.

Contributor: Sarah


5.18 Ellison (2010) “Repeatability and transparency in Ecological research”

A fundamental in science is that results must be reproducible before they are accepted as fact. However, the context-dependent nature of ecology makes perfect replication virtually impossible. One way to make ecological studies easy to reproduce is through the assembly of derived data sets and their subsequent analysis, this is known as ecological synthesis. The key to ecological synthesis is transparency and full disclosure of all aspects of the study. However, despite mandates from funding agencies that data be made publicly available, raw data are not always easily accessed. Fortunately, data repositories are growing and with descriptive metadata – a method for describing ecological data – data is made easier to interpret and re-use. Once obtained, a data set will require a certain degree of transformation after which it is ready for analysis. It is essential that researchers clearly document steps taken to derive the data as well as record all statistical tools used on the data to ensure reproducibility. Such a record should be a requirement for publication of any meta-analysis. Unfortunately, there are no standards for process meta-data and open-source software tools, while attractive, are constantly evolving. Moving forward, it appears as though more ecologists are recognising that sharing of data benefits the entire scientific enterprise and results in successful collaboration. This has resulted in rapid development of data and sharing tools.

https://www.jstor.org/stable/27860827

Contributor: Cailin


5.19 Meng, X. 2020. Reproducibility, Replicability and Reliability

This Article highlights how reproducibility outweighs replicability in terms of the reliability of reported results. The author also brings attention to the issue of the lack of profesional communities focusing on reproducibility verification. The author goes on to argue that reproducibility must adhere to strict rules, while replicability has more felxibility due to inherant uncertainties and possible confounding factors. Meng (2020) also explains why reproducibility is more desireable than replicability in the sense that replicability does not focus on improving upon previous studies, while reproducibility allows for permutations and future improvements.

DOI:10.1162/99608f92.dbfce7f9

Contibutor: Shane


5.20 White et al. (2015) “Reproducible research in the study of biological coloration”

https://doi.org/10.1016/j.anbehav.2015.05.007

According to the paper, a study is considered truly reproducible when methods are completely reported, data are archived and publicly available, and every modification of raw data is documented and preserved. In the field of biological coloration, the direct measure of reflectance through spectrometry has been adopted as the standard, however digital photography to quantify color is increasingly gaining popularity. The paper’s guidelines for reporting color analyses details cover the methods of spectral analyses and visual modelling, which are included in metadata and are essential for reproducibility. To assess reporting and reproducibility in the field of biological coloration, the authors searched through journals using phrases such as ‘color’ or ‘spectra’. They also recorded whether the data and/or code were openly available and this information, along with their analysis script, was stored in github. After their survey, they found that there’s inconsistency and incompleteness in commonly reported methodological details from the papers. Some failed to report details such as distance and pixel numbers. They also found that 38% of papers made reference to previous work for details, which were often incomplete as well. Due to this, they suggested reporting all details along with current manuscript. Of the studies analysed, 1.7% openly provided raw data, 31.7% provided processed data, and 66.7% did not provide data. Overall, there’s evidence that this trend is shifting as funders and publishers mandate data release.

Contributor: MuanoR


5.21 Voelkl et al. (2020) Reproducibility of animal research in light of biological variation

DOI: 10.1038/s41583-020-0313-3

This paper defines reproducibility as “The ability to produce similar results by an independent replicate experiment using the same methodology in the same or a different laboratory.” Although this definition varies from the one we have been using, the paper’s contents deal largely with the reproducibility crisis (particularly in animal research) and offer a potential cause and solution. Scientists often conflate biological variation to noise. They therefore try to homogenise their sample group in order to increase power and thereby use less animals. This approach is typically deemed more ethical. However, homogenisation makes it difficult for future studies to reproduce results, as variation is integral to biology. More tests would therefore have to be performed, which will require more animals in the long run. The authors therefore recommend deliberately including relevant biological variation in the sample group. It is vital for scientists to record every decision and process clearly so that studies may be successfully reproduced in the future.

Contributor: Ashleigh Gouws


5.22 Reproducible Research and Public Data-Sharing in Ecology: A Q&A with LAGOS Creators

A recent study published in GigaScience introduces LAGOS (LAke multi-scaled GeOSpatial and temporal database), an open database compiling hundreds of ecological datasets. LAGOS supports macrosystems ecology research, which analyses ecological interactions at regional and continental scales. Professor Pat Soranno of Michigan State University introduces the motivations behind LAGOS, emphasizing the need to study large ecological patterns, such as freshwater ecosystems’ role in global carbon cycles, however, A key challenge in ecology is “dark data”, when valuable datasets are trapped on individual researchers’ computers, making it inaccessible. LAGOS therefore, addresses this problem by integrating diverse datasets into a reproducible, publicly available resource. By making data more accessible, LAGOS enables scientists to ask large-scale, impactful ecological questions and contribute to a growing movement of data-sharing in environmental research.

Link: https://blogs.biomedcentral.com/on-biology/2015/07/01/reproducible-research-public-data-sharing-ecology-qa-lagos-creators/

Contributor: Robyn


5.23 “Wildlife Biology, Big Data, and Reproducible Research” by Keith P. Lewis, Eric Vander Wal and David A. Fifield (2018)

Historically, the field of wildlife biology has been regarded as data poor, and certain study topics, like animal movement, were not even regarded as part of mainstream ecology because of the lack of data. However, technological advancements (e.g., GPS tags and camera traps) have allowed scientists to produce larger, more accurate data sets, which allow wildlife biologists to conduct analyses over large spatial and temporal scales. This has shifted wildlife biology into the realm of “big data”. However, these changes also mean that the way in which wildlife biologists need to manage, analyse and store data must change. The following paper by Lewis et al. (2018) provides a conceptual framework for reproducible research in wildlife biology. They also present case studies which further emphasise the importance and advantages of using reproducible research practices in the field.
The authors conclude that with the advances in the complexity of technology and studies requires a simultaneous change in the way wildlife biologists conduct research and teach others to conduct research.

References LEWIS, K. P., VANDER WAL, E., & FIFIELD, D. A. (2018). Wildlife Biology, Big Data, and Reproducible Research. Wildlife Society Bulletin (2011), 42(1), 172–179. https://doi.org/10.1002/wsb.847

Contributor: Muhammad Uzair Davids


5.24 Reproducible research can still be wrong: Adopting a prevention approach (Leek & Peng, 2015)

This article seeks to look beyond the problem of reproducible research and makes an appeal for greater education in actual data analysis and study design. Essentially it highlights that although reproducible research is very important in diagnosing faults within the methodology / data / analysis it doesn’t fix the faulty thinking that created the problem. From the perspective of the authors reproducibility is a “treatment” against “poor data analyses” (Leek & Peng, 2015) but it is not a preventative measure. With increasingly complicated data and analyses, and higher rates of paper submissions to journals, even if there is transparency and full reproducibility mistakes may not be intercepted in time to prevent negative effects. They propose scaling up data science education through massive online open courses (MOOCs) and crowed sourced short courses in data analysis. These will be focused on teaching methods which are the most reproducible and replicable for the novice data analyst. Part of the challenge in implementing this is the need to find the current most efficient statistical methods, analyses types and software for the job. The authors call this method “evidence-based data analysis”.

J.T. Leek, & R.D. Peng, Reproducible research can still be wrong: Adopting a prevention approach, Proc. Natl. Acad. Sci. U.S.A. 112 (6) 1645-1646, https://doi.org/10.1073/pnas.1421412111 (2015).

Contributor: Nic van Tol


5.25 Ocean FAIR Data Services (Tanhua et al., 2019)

Well-established data management systems are crucial for ocean observing systems as they ensure essential data is collected, preserved, and accessible for current and future users. Effective management involves collaboration in various areas such as observations, data and metadata assembly, quality assurance/control, and publication. To promote long-term usability, ocean data should follow the FAIR principles—Findable, Accessible, Interoperable, and Reusable—which help transition unstructured data into organized, standardized datasets. As ocean observing systems evolve, new autonomous platforms collect more essential ocean variables, necessitating modern data infrastructures to automate the data life cycle. However, several challenges exist in implementing FAIR data management, including the diverse oceanographic data; wide range of different data management structures; high data volumes; new sensor formats; formats not universally applicable; gaps between data producers and end-users; slow development of common protocols; and poorly defined best practices. The paper discusses solutions to these challenges, enabling scientists and developers to download data and apply traditional analysis techniques. This also allows for the utilization of modern tools to transform, manipulate, visualize, and innovate with data. The solutions promote reusability by making sure data can be understood by those who did not produce the data. It is essential to explain the importance and implementation of FAIR data principles to early career marine scientists. This understanding will help ensure that these principles and effective data management practices become second nature and an integral part of ocean observations.

Link: https://doi.org/10.3389/fmars.2019.00440

Contributor: Kayla Heuer


5.26 The Tao of open science for ecology; a path towards effective data management and reproducibility.

Hampton et al. (2015) logically and enthusiastically advocate for a shift in mindset and practice towards ‘open science’ in ecology. They propose a series of targeted approaches based on data stewardship, transparency in the data-life-cycle, using open-source tools, and clearly documenting the scientific process of discovery. In the context of data management and reproducibility, transparency allows scientists to standardize code, data and metadata storage and archiving throughout the data-life cycle. Clear documentation throughout the scientific process enables reproducibility and replication, and creates space and material for discussion, constructive feedback, and collaboration. Ultimately the Tao of open science, advocates for practices, tools and methods of scientific inquiry that depart from the “exclusive silos of traditional journal archives”, and instead create open-access, easily machine readable, logically structured and standardized repositories of data, publications and code. This shift will benefit current and future scientists pushing the frontiers of discovery, as it encourages, and even demands careful and consistent critical thinking, organization and documentation across every scientific endeavour.

Link: https://esajournals.onlinelibrary.wiley.com/doi/full/10.1890/ES14-00402.1

Contributor: Arwenn


5.27 Research Data Management for Reproducibility – SciNet HPC at the University of Toronto

This session provides a high-level overview of research data management (RDM) and its role in ensuring reproducibility. Rather than focusing on discipline-specific methods, it explores broader developments, challenges, and future directions. It covers the history of reproducible research, key challenges, and the importance of applying FAIR principles (Findable, Accessible, Interoperable, Reusable). The movement began in the 1990s with Jon Claerbout, who argued that openly sharing data and code helps prevent errors in analysis. Whitaker’s Matrix of Reproducibility categorises reproducibility into four levels: reproducible (same data, same code, same results), replicable (different data, same code, same results), robust (same data, different code, same results), and generalisable (different data, different code, same results). The session highlights replication issues in psychology, economics, and social sciences, where non-replicable studies remain highly cited. In computational research, challenges include high costs, rapidly evolving software, and access restrictions. The Research Data Alliance promotes the idea of a “Research Compendium” to ensure reproducibility. It concludes with US National Science Foundation recommendations on documentation, statistical transparency, and policies to improve replication efforts. As reproducibility grows in importance, RDM plays a key role in ensuring efficient reuse of data, code, and documentation, thereby maintaining the credibility of research.

Contributor: Lilith


5.28 “Ten Years of Sharing Reproducible Research” by Limor Peer, PhD (Yale University)

This article reflects on a decade of advancements in promoting reproducibility in research, especially the challenges and successes of Yale’s ISPS Data Archive. It addresses the challenges faced when trying to reuse shared data and code, especially the common obstacles such as poor documentation, software incompatibility, and computational failures that may hinder reproducibility. Curators as well as their importance in ensuring long-term accessibility is also highlighted alongside YARD, a workflow tool that enhances data curation and verification. Open science initiatives and journal policies have improved reproducibility; however, challenges remain, such maintaining functional code over time and integrating automation with human oversight. This piece offers valuable insights into the evolving landscape of research transparency and the critical role of structured data stewardship.

Link: https://researchdataq.org/editorials/ten-years-of-sharing-reproducible-research/

Contributor: Moegamad Uzair Jack


5.29 Centre of Open Science’s Transparency and Openess Promotion Guidelines

The Transparency and Openness Promotion (TOP) guidelines from the Centre for Open Science (COS) provides a framework for improved transparency, reproducibility, and overall integrity in research. Introduced in 2015, these guidelines set 8 standards with different levels of implementation for researchers and journals to strive for in the movement towards open science. These include citation standards, replication, and data or analytical methods transparency among others. This website allows you to explore these guidelines and understand best practices to improve your own research. You can also search for specific journals to see what measures they are taking to ensure the transparency of the work they publish. These guidelines help build a strong foundation for reducing error and bias in research, and encourage collaboration and replication between researchers thereby increasing the credibility and visibility of scientific research being done.

Link: https://www.cos.io/initiatives/top-guidelines

Contributor: Jana


5.30 Earth Observation Open Science: Enhancing Reproducible Science Using Data Cubes

This editorial for the “Earth Observation Data Cube” Special Issue in the journal “Data” considers reproducibility in Earth Observation Science. Earth Observation (EO) datasets generated by satellites are often large, complex, and heterogenous which limit use of the data. Earth Observation Data Cubes (EODC) represent a step forward in making geospatial data freely and openly available from a variety of data repositories. By grouping data into multidimensional matrices, EODCs enable easier storage, organisation, management, and analysis of large EO datasets. This allows use of data by a greater number of users, addresses under-utilisation of the massive amounts of useful data, and contributes to making EO science open and reproducible.

Giuliani, G., Camara, G., Killough, B. & Minchin, S. 2019. Data, 4(4), 147 https://doi.org/10.3390/data4040147

Contributor: Jess Prevôst


5.31 Keeping a Paper Trail: Data Management Skills for Reproducible Science (Dr K. Laskowski)

This YouTube video provides an insightful overview of the importance of data integrity in reproducible research, using an analogy of the data life cycle as the growth of a child. It mentions how the development of the child (data) from birth to maturity (collection to publication) must be documented by you, the parent (author). The video outlines several practices for maintaining data quality, including documenting each stage of the data collection process, carefully entering and cleaning data, and using tools like R and GitHub to track changes. Now that I have acquired the skill of using those software tools, I too understand the value of using them to track changes. The use of RMarkdown is highlighted as a useful tool to generate reproducible results by combining code and output in a single document. The final step is to deposit both data and code into a public repository, making research open and available for collaboration and correction of errors. This is crucial for reproducible science, as it enables transparency and helps avoid mistakes. The video’s message emphasizes the relevance of reproducible research and data management in ensuring the credibility and reliability of scientific findings, particularly in the context of biology, where data accuracy and integrity is vital for understanding complex systems, which cannot communicate (through language, at least) themselves.

Link: https://www.youtube.com/watch?v=58FUfrIM6bE

Contributor: Taylor; Github username: (taylormgilbert?)


5.32 Data management: The first step in reproducible research.

Soundararajan, S., Mishra, S., 2023. This paper highlights the importance of good data management in producing credible research findings. The authors claim that data collection is an often overlooked yet, important aspect that should be considered to reduce research misconduct. Some of the important aspects identified in this paper include data preparation that involves proper organization of the files, use of correctly labelled folders, and proper documentation to make sure that data is easily understood and easily accessible. They also stress on the interoperability that enables other people to use the data without the help of the primary investigators, data storage and sharing policies including data management plans. The paper is significant as it focuses on a real yet rarely addressed issue in research. While the discussions on methodology and analysis are frequent, this paper brings to light the otherwise hidden work of data management and how it directly impacts the quality and replicability of research conclusions. The authors have come up with a set of practical steps that any researcher can use in any field to ensure that data is handled properly and that weakens the scientific process. It is a call for researchers to progress from merely producing data to managing it as a product which can be used again.

Contributor: Amarah

Link: https://pmc.ncbi.nlm.nih.gov/articles/PMC10880825/

Reference: Soundararajan, S., Mishra, S., 2023. Data management: The first step in reproducible research. Indian J Occup Environ Med 27(4), 359-363.


5.33 TEDxCapeTown: Mike Markovina - Moving Sushi

Moving Sushi co-founder Mike Markovina discusses the findings of the Marine Resource Expedition (2008–2010), a 24-month trip through 42 nations that documented successful marine conservation initiatives. In order to comprehend the social, economic, and ecological aspects of marine resource protection, he highlights the significance of open-access data and cooperative research. Markovina emphasises that transparent research and efficient data management are essential for well-informed decision-making and long-term marine conservation strategies. In order to bring about significant change, he promotes a change in viewpoint that emphasises community involvement and the human aspects of managing maritime resources. By means of narratives and visual documentation, Markovina demonstrates how information exchange and transparency may motivate group efforts to save our oceans.

Link: https://www.youtube.com/watch?v=jvnjOUWk4js

Contributor: Demi


5.34 A taxonomy for reproducible and replicable research in environmental modelling (B.T Essawy et al., 2020).

The paper discusses how cyberinfrastructure aids in reproducible workflows. It highlights tools such as GitHub and Figshare while introducing a new tool called Sciunit (http://sciunit.run). Sciunit integrates all digital artifacts of a research project—including code, data, and scripts—alongside the research paper, providing a more complete record of the research. Prior studies found that Sciunit improved reproducibility in computational analysis but lacked mechanisms for community sharing. In contrast, another tool called Hydroshare (http://www.hydroshare.org) enables the sharing of code, data, and metadata but does not confine all research components into a single package. Therefore, using Sciunit and Hydroshare together offers the best solution: Sciunit packages data and code, while Hydroshare aids in sharing within a broader community.

Link: https://doi.org/10.1016/j.envsoft.2020.104753

Contributor: Duncan Sephton


5.35 Reproducibility in ecology and evolution: Minimum standards for data and code.

This paper highlights the growing need for reproducibility in ecology and evolution by establishing minimum standards for sharing data and code. While informal data sharing has long been a practice, the authors argue that modern research must provide accessible, well-documented datasets and analytical code to ensure transparency, facilitate reuse, and improve scientific progress. They advocate for open data repositories following FAIR principles (Findable, Accessible, Interoperable, Reusable) and emphasize that journals and funding bodies should require complete datasets, metadata, and code as a condition for publication. The paper also discusses acceptable exemptions for sensitive data, including human subjects and endangered species. By adopting standardized data-sharing practices, the ecological and evolutionary research communities can enhance reproducibility and long-term data utility.

Link: https://onlinelibrary.wiley.com/doi/10.1002/ece3.9961

Contributor: Bianca Louw


5.36 What Makes Science True? | NOVA (https://youtu.be/NGFO0kdbZmk?si=3gTPU4O113mimgwD)

[Youtube Video uploaded by “NOVA PBS Official”, produced by “Dakin Henderson and Kelly Thomson”] This video goes into the reasons for and the effects of the ‘reproducibility crisis’, where many (exploratory) studies are taken as fact without there having been an attempt to reproduce the study - this is in part as a result of exploratory studies being funded and not reproducibilty studies. It has been found that many drugs fail when they get to the clinical trial phase and this is thought to be due to a lack of proper validation of the studies before reaching that point - suggesting why it is important for there to be repeated testing in the initial research into the drug. Lastly, although most data on study replicability has been in psyclology and biomedical sciences, it is known that it is a far more widespread issue.

Contributor: Catharina


5.37 A beginners guide to producing reproducible research (Alston and Rick,2021)

The “Replication crisis” is a research barrier that hinders the ability of scientific results being reproduced. This paper emphasises why we should strive towards reproducible research; the possible challenges and the continuous steps we can take to produce reproducible research. Reproducible research allows for quicker and easier analysis without the need to start from baseline. It also indicates transparency and trustworthiness to reviewers and scientific peers ,thus increasing citations thereof. This encourages follow-up studies and easy correction of mistakes. However complex biological systems might be more difficult to reproduce, and propriety programs prevent global accessibility. Historical software and hardware may also hinder the ability to access data. There is a standard for reproducibility, and this includes ensuring reproducibility before, during and after analysis. The data should be backed up in multiple locations to prevent any data loss. The data should also be changed to “tidy” format to allow easy analysis and manipulation. During analysis, it is important that all operations be done using annotated scripts to allow for repetition. Thus, we should finally share this data and code with all its parameters to ensure public accessibility. In conclusion, reproducible research is a process and should be consistently strived for.

References Alston, J.M. and Rick, J.A., 2021. A beginner’s guide to conducting reproducible research. Bulletin of the Ecological Society of America, 102(2), pp.1-14.

Contributor: Kauthar Ismail


5.38 Research Culture is Broken; Open Science can Fix It | Rachael Ainsworth | TEDxMacclesfield

Open science as a movement is a growing practice in which science is done in such a way that research is openly shared with the public and allows for collaboration, contribution and access to the latest findings and methods. This TED talk dives into some of the current downfalls of our modern research culture and show how open science could be the key to mitigating these problems.

Link: https://youtu.be/c-bemNZ-IqA

Contributor: Timothy Muthama


5.39 Wilson et al. (2017) “Good enough practices in scientific computing”

Computers are essential in science, yet many researchers lack basic computational skills, leading to lost data and inefficiencies. Just as lab work requires structured documentation, computational research needs organized data, clear workflows, and reproducibility.

“Good Enough Practices in Scientific Computing” offers practical guidelines for researchers of any skill level, covering data management, programming, project organization, and collaboration. Drawn from real-world experience, these simple strategies help streamline research, improve transparency, and make scientific work more efficient and reliable.

Link: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1005510

Contributor: Jess


5.40 Wildlife biology, big data, and reproducible research, (Lewis, Van der Wal and Fifield, 2018).

This paper reviews the current state of reproducible research and data management in the field of wildlife biology. Lewis et al. (2018) share the merits of reproducible research and take you through how the type and scale of data have changed in the field of wildlife biology in the last few years. This change, according to them, necessitates a shift to increased reproducibility and more rigorous data management. They provide various case studies showing how it can be implemented (which provides tangible examples to wrap your head around the concept). They also suggest various courses of action that would allow reproducibility and data management to be mainstream in the field, including education, usage of freely available software such as GitHub and willingness to share data. Finally, they also mention potential limitations to this form of research management in the field of wildlife biology specifically, but suggest that a broader shift to ensuring reproducible research will be holistically beneficial to the field.

Link: https://doi.org/10.1002/wsb.847

Contributor: Connor


5.41 How FAIR are plant sciences in the twenty-first century? The pressing need for reproducibility in plant ecology and evolution

The paper states that the 21st century is the century of data science. It then proceeds to place emphasis on why plant ecological and evolutionary studies must acknowledge how important specimens are as primary data. Herbarium specimens are often neglected during collection, by authors. The collection and availability to the public of specimens is crucial because if only data was accessible, meta-analyses errors may occur due to there being no specimen revisits and verifications. In doing so, the FAIR principles are obeyed and as a result, original data can be used in reproducible research to further education. The paper also touches on why specimen preservation is now a rare practice. It is because of lack of training and shifts towards more modern practices.

Link: https://doi.org/10.1098/rspb.2020.2597

Contributor: Naval