Collaborate, communicate and publish code using git and github/gitlab.
This should be a very basic guide on how to publish your codes, software and data to make it available to a broader public or just your colleagues next door (Read here why YOU should do it!). Please be aware: this guid is more or less a compilation of existing resources on this topic! Note that this guide does not claim to be finished, the whole truth or perfect at all. If you have experience with writing scientific codes, you might find this guide incomplete. If you usually write code only for your own without sharing it, you might be overwhelmed. However, to cover some common ground, we want to raise awareness and focus your attention on your code being an integral part of your scientific work and publication.
Most of our publications and products are based on data processing and analysis. Good scientific practice also includes reproducibility and reusability of codes and data as key and they are central concerns of the Helmholtz Open Science Initiative. In fact, making your research codes and data tidy, reproducible and reusable is nowadays much easier and fun than most scientists think, and it can considerably improve your ability to cooperate with your peers and colleagues in a transparent and trustworthy environment.
To preare for the tutorial, you should check if you have git installed and can use it in a terminal. If you have no git installation, you can follow the instructions of the official documentation. You will also need to use a platform for sharing your code, e.g., make sure you have a github account you can access or use your gitlab account at the Helmholtz codebase. For the Helmholtz codebase, you can use any Helmholtz account to login via the Helmholtz AAI. You will also need to be able to push to github/gitlab for which you have to create and register an ssh key with github/gitlab. For this, use an existing or create a new ssh key on your local computer and store the public key, e.g., in your github profile settings and/or your gitlab account. For more detailed instructions, please also follow, e.g., the official github documentation.
If you are a Windows user with no experience in git or using the terminal, we recommend the following based:
- Install git on your windows machine (if is not installed already). You may invoke in your Windows Power Shell the following command (assuming your default location is U:
This should install all software of the git for Windows tool set.
U:\> winget install --id Git.Git -e --source winget
- Open "git GUI" and go to the menu "Help/show SSH key"
- If no key is found, then click on "Generate key" to generate a new ssh key. If you get an error concerning missing
.ssh
folders, you might have to create that yourself, e.g.,mkdir .ssh
. - Copy the content of the public ssh key.
- Go to github/gitlab in your web browser and in the settings menu go to SSH and GPG keys section. Click on "New ssh key" and paste your public ssh key.
- Open a "git bash" terminal and test your git connection by invoking, e.g., the following command:
ssh -T [email protected]
The publication of code and data should receive the same focus of attention and planning as a classical scientific paper publication. It should be an important part when you plan a project. The best thing to do is to always start writing your code having this in mind and ask yourself honestly:
- Will somebody else be able to understand what i did?
- Will he/she be able to run my code and reproduce my results/plots (without complicated explanations)?
- How can i make life easier for them?
If you have these things in mind during writing your code, you are already on a good way.
Luckily, there are some very helpful tools and methods to help you getting your code organized. The Helmholtz Open Science Seminar has presented some very helpful guidance and a factsheet to help researchers getting their code on track for easier collaboration, reproducibility and fun! Here is an excerpt from the factsheet
Even when you have not written any code yet, you should start your project by creating a project repository, e.g., on github or the Helmholtz codebase. This can be a great landing page for your project. If you start by writing a comprehensive README.md
, you can simply refer colleagues and collaborators to your project page where they can find all neccessary information without you having to explain it all over again. You can also structure your project more efficiently by using the repositories issue management. You can also check out the github/gitlab pages feature which will enable you to create nice webpages easily from your project repository.
If you publish your code, you should be aware of some basic technical requirements that should be checked. If you have followed some of the advice above, you should easily be ready to publish. However, the minimum requirements are:
- A publically accessible project repository.
- A
README
, preferrably in markdown format that should include some information on your project, further links and basic technical documentation. - License (check if you have used GPL licensed libraries!) See also here and here!
environment.yml
orrequirements.txt
file that defines software dependecies.- DOI, e.g., using
zenodo
, works well with github.
You can use howfairis to automatically check some basic requirements of FAIR principles for your project repository.
Here is a short list of publications by GERICS employees which might be good examples for code and data publication:
- Climate Action Sheet "Stadtwald Karlsruhe im Klimawandel"
- Principles analysis
- LEAFlood model
- Kliwist modelchain
- Irrigation analysis
- pyremo package
- py-cordex
- The Turing Way Handbook
- Does your code stand up to scrutiny?
- Nature Checklist
- AGU Data and Software Guide
- Open up to Open Science
- Scientific collaboration and project management in github
- Barnes, N. Publish your computer code: it is good enough. Nature 467, 753 (2010). https://doi.org/10.1038/467753a
- UNESCO recommendations on open science
- Free git course