À la fin de ce cours, vous saurez :
Manage research data:
- understand the challenges posed by large volumes of data
- archive code and data on well-known archives such as Software Heritage and Zenodo
- integrate data into versioning (Git Annex)
- use structured binary data formats (FITS, HDF5)
Use tools and techniques for controlling the software environment:
- understand how software packages are built and managed
- work in controlled software environments on a daily basis
- deploy software environments as containers (e.g., with Docker)
- manage software environments using a functional package manager (e.g., with Guix)
Automate long or complex computations using scientific workflows:
- understand the challenges of scaling up: long calculations, distributed calculations
- choose a workflow tool adapted to your needs
- automate a data analysis using make and snakemake
- control the software environments of a workflow
Following the success of the MOOC "Reproducible research: methodological principles for transparent science", the authors continue exploring reproducibility with a focus on massive data and complex calculations. These two MOOCs complement each other and offer a coherent training program on the subject.
In this 2nd MOOC, you will learn how to manage large datasets and complex computations in controlled software environments, using formats such as JSON, FITS, and HDF5, platforms like Zenodo and Software Heritage, tools like git-annex, Docker, Singularity, Guix, make, and Snakemake. Keys concepts are introduced and applied through numerous hands-on exercises and a real-life use case on sunspot detection, demonstrating how to work in a reliable and reproducible way.
A new module for this session proposes exercises illustrating how the tools and techniques we teach are helpful in the daily practice of computational research. Interviews with experienced practitioners of reproducible research also discuss related tools, helping you decide whether you should invest in more elaborate tools or not, and which pitfalls you may stumble upon.
This MOOC consists of four independent modules that combine video lectures, quizzes, practical sessions, textual course supports, and many exercises for getting hands-on experience with the tools and methods that are presented.
Most of the exercises can be carried out in a JupyterLab environment made available to each MOOC learner. Some exercises require a Linux computer and the ability to install software on it.
This course is for everyone who relies on a computer to perform data analysis. You should have some experience with running commands in a terminal, and have a basic knowledge of git (at the level of the first MOOC) and Scientific Python.
An Open Badge for successful completion of the course will be issued on request to learners who obtain an overall score of 50% correct answers to all the quizzes and learning activities. Assessment is based on quizzes and practical exercises.
Catégories
Catégories
Catégories
Catégories
Catégories
Catégories
![]()
Vous êtes autorisé à :
Selon les conditions suivantes :
![]()
Vous êtes autorisé à :
Selon les conditions suivantes :