Skip to main content
FUN MOOC
  • FAQ
  • Home
  • News
  • Courses
  • GRADEO
  • Diplômes
  • Organizations
  • You are here:
  • Home
  • Courses
  • Machine learning in python with scikit-learn

Machine learning in Python with scikit-learn

Ref. 41026
CategoryComputer science and programmingCategoryDigital and technology
  • Duration: 13 weeks
  • Effort: 36 hours
  • Pace: ~2h45/week
Build predictive models with scikit-learn and gain a practical understanding of the strengths and limitations of machine learning!
No open course runs

What you will learn

At the end of this course, you will be able to:

  • Grasp the fundamental concepts of machine learning
  • Build a predictive modeling pipeline with scikit-learn
  • Develop intuitions behind machine learning models from linear models to gradient-boosted decision trees
  • Evaluate the statistical performance of your models

Description

Predictive modeling is a pillar of modern data science. In this field, scikit-learn is a central tool: it is easily accessible, yet powerful, and naturally dovetails in the wider ecosystem of data-science tools based on the Python programming language.

This course is an in-depth introduction to predictive modeling with scikit-learn. Step-by-step and didactic lessons introduce the fundamental methodological and software tools of machine learning, and is as such a stepping stone to more advanced challenges in artificial intelligence, text mining, or data science.

The course is more than a cookbook: it will teach you to be critical about each step of the design of a predictive modeling pipeline: from choices in data preprocessing, to choosing models, gaining insights on their failure modes and interpreting their predictions.

The training will be essentially practical, focusing on examples of applications with code executed by the participants.

The Mooc is completely free of charge. All the course materials are also available on a github repository.

The authors of the course are scikit-learn core developpers, they will be your guides throughout the training!

Format

The course will cover practical aspects through the use of Jupyter notebooks and regular exercises. Throughout the course, we will highligh scikit-learn best practices and give you the intuition to use scikit-learn in a methodologically sound way.

Prerequisites

The course aims to be accessible without a strong technical background. The requirements for this course are:
- basic knowledge of Python programming : defining variables, writing functions, importing modules
- some prior experience with the NumPy, pandas and Matplotlib libraries is recommended but not required

For a quick introduction to these libraries, you can use the following resources : Introduction to NumPy and Matplotlib by Sebastian Raschka and 10 minutes to pandas.

Assessment and certification

Students' work in the course is assessed through quizzes after the lessons and programming exercises at the end of every modules. A certificate will be issued by FUN, confirming successful completion of the Mooc.

Course plan

    • Machine Learning concepts
    • Tabular data exploration
    • Fitting a scikit-learn model on numerical data
    • Handling categorical data
    • Overfitting and Underfitting
    • Validation and learning curves
    • Bias versus variance trade-off
    • Manual tuning
    • Automated tuning
    • Intuitions on linear models
    • Linear regression
    • Modelling with a non-linear relationship data-target
    • Regularization in linear model
    • Linear model for classification
    • Intuitions on tree-based models
    • Decisison tree in classification
    • Decision tree in regression
    • Hyperparameters of decision tree
    • Ensemble method using bootstrapping
    • Ensemble based on boosting
    • Hyperparameters tuning with ensemble methods
    • Comparing a model with simple baselines
    • Choice of cross-validation
    • Nested cross-validation
    • Classification metrics
    • Regression metrics

Course runs

Archived

  • From May 18, 2021 to July 14, 2021
  • From Feb. 15, 2022 to May 17, 2022
  • From Oct. 18, 2022 to Jan. 17, 2023

Course team

Arturo Amor

Categories

Arturo Amor is an engineer at Inria. He is in charge of broadening the scikit-learn documentation's accessibility to all kind of users.

Loïc Estève

Categories

Loïc Estève is a research engineer at Inria. He is a scikit-learn core developer since 2016.

Olivier Grisel

Categories

Olivier Grisel is a machine learning engineer at Inria. He is a scikit-learn core developer since 2010.

Guillaume Lemaître

Categories

Guillaume Lemaître is a research engineer at Inria. He is a scikit-learn core developer since 2017.

Gaël Varoquaux

Categories

Gaël Varoquaux is a research director at Inria. He is one of the creator of scikit-learn and the project manager for the scikit-learn consortium.

Thomas Schmitt

Categories

Thomas Schmitt is a machine Learning Engineer at Inria.

Organizations

Inria

Partnership

Hosting the Jupyter notebook execution environment for this MOOC.

Social networks

Follow us on twitter @InriaLearnLab and feel free to use the #ScikitLearnMooc hashtag.

License

License for the course content

Attribution

You are free to:

  • Share — copy and redistribute the material in any medium or format
  • Adapt — remix, transform, and build upon the material for any purpose, even commercially.

Under the following terms:

  • Attribution — You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.

License for the content created by course participants

Attribution-NonCommercial-NoDerivatives

You are free to:

  • Share — copy and redistribute the material in any medium or format

Under the following terms:

  • Attribution — You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.
  • NonCommercial — You may not use the material for commercial purposes.
  • NoDerivatives — If you remix, transform, or build upon the material, you may not distribute the modified material.
FacebookTwitterLinkedin

Learn more

  • Help and contact
  • About FUN
  • Legal
  • Privacy policy
  • User's charter
  • General Terms and Conditions of Use
  • Sitemap
  • Cookie management
Powered by Richie