At the end of this course, you will be able to:
Manage research data:
- understand the challenges posed by large volumes of data
- archive data on well-known archives such as Software Heritage and Zenodo
- integrate data into versioning (Git Annex)
- use structured binary data formats (FITS, HDF5)
Use tools and techniques for controlling the software environment:
- understand how software packages are built and managed
- deploy software environments as containers (ex: Docker)
- manage software environments using a functional package manager (ex: Guix)
- work in controlled software environments on a daily basis
Automate long or complex computations using workflows:
- understand the challenges of scaling up: long calculations, distributed calculations
- choose a workflow tool adapted to your needs
- automate a data analysis using make and snakemake
- control the software environments of a workflow
Following the success of the MOOC "Reproducible research: methodological principles for transparent science", the authors continue on the same theme, dealing more specifically with the issues of massive data and the complex calculations associated with them. These two MOOCs complement each other and offer a coherent training program on the subject.
In this second MOOC, we will show you how to improve your practices for managing large data and complex computations in controlled software environments:
The strength of this new MOOC lies in a general and systematic presentation of the major concepts and of how they translate into practical solutions through numerous hands-on sessions with state-of-the-art open-source tools.
This MOOC consists of three modules that combine video lectures, pratical sessions, textual course supports, and many exercises for getting hands-on experience with the tools and methods that are presented.
Most of the exercises can be carried out in a JupyterLab environment made available to each MOOC learner. Some exercises require a Linux computer and the possibility to install system software on it.
This course is for everyone who relies on a computer to perform data analysis. You should have some experience with running commands in a terminal, and have a basic knowledge of git (at the level of the first MOOC) and scientific Python.
An Open Badge for successful completion of the course will be issued on request to learners who obtain an overall score of 50% correct answers to all the quizzes and learning activities. Assessment is based on quizzes and practical exercises.
Categories
Categories
Categories
Categories
Categories
Categories
You are free to:
Under the following terms:
You are free to:
Under the following terms: