Exploratory Multivariate Data Analysis
- Duration: 5 weeks
- Effort: 25 hours
- Pace: Self paced
What you will learn
At the end of this course, you will be able to:
- how to summarise and synthesise datasets using simple graphs
- how to use visualization methods adapted to multidimensional exploratory analysis
- how to interpret the results of a factor analysis and a classification;
- how to ecognise the method adapted to the exploration of a dataset according to the nature and structure of the variables;
- how to analyse the responses to a survey;
- how to perform a textmining
- how to implement factorial and classification methods on the free software R
In summary, you will be able to implement and interpret multidimensional exploratory analyses.
Description
Exploratory multivariate data analysis is studied and teached in a French-way since a long time in France. This course focuses on four essential and basic methods, those with the largest potential in terms of applications: principal component analysis (PCA) when variables are quantitative, correspondence analysis (CA) and multiple correspondence analysis (MCA) when variables are categorical and clustering. An extension to Multiple Factor Analysis (MFA) will give you the opportunity to analyse more complex dataset that are structured by groups.
We hope that with this course, the participant will be fully equipped (theory, examples, software) to confront multivariate real-life data.
Format
This course is application-oriented; formalism and mathematics writing have been reduced as much as possible while examples and intuition have been emphasized and the numerous exercises done with FactoMineR (a package of the free R software) will make the participant efficient and reliable face to data analysis.
Prerequisites
This course will be held in English. It has been designed for scientists whose aim is not to become statisticians but who feel the need to analyze the data themselves. It is therefore addressed to practitioners who are confronted with the analysis of data in marketing, surveys, ecology, biology, geography, etc.
An undergraduate level is quite sufficient to capture all the concepts introduced.
Basic knowledges in statistics are necessary, such as: correlation coefficient, chi-squared test, one-way ANOVA.
On the sofware side, an introduction to the R language is sufficient, at least at first.
Assessment and certification
To follow this course, you have the choice between two formulas. The DISCOVERY path gives you access to videos, quizzes and exchanges in the forum. Additionnaly, the QUALIFYING path gives you access to a qualifying exam.
- Discovery path
If you opt for this path, you will have access to the videos, the quizzes, the self-corrected exercises and the exchanges in the forum. For this path, no certificate will be delivered. The registration is free.
- Qualifying path
In addition to the activities offered in the DISCOVERY path, the QUALIFYING formula will allow you to obtain a certificate in the form of a "certificate". To do this, you will have to take an exam, monitored remotely, lasting 1 hour and 30, consisting of 20 multiple choice questions (MCQ) and obtaining 10 correct answers.
The registration fee for the qualifying course is 60€.
Course plan
- Data - Practicalities
Studying individuals and variables
Aids for interpretation
PCA in practice using FactoMineR
- Data - introduction and independence model
Visualizing the row and column clouds
Inertia and percentage of inertia
Simultaneous representation
Interpretation aids
Correspondance Analysis in practice using FactoMineR
- Data - issues
Visualizing the point cloud of individuals
Visualizing the point cloud of categories - simultaneous representation
Interpretation aids
Multiple Correspondance Analysis in practice using FactoMineR
- Hierarchical clustering
An example, and choosing the number of classes
Partitioning methods and other details
Characterizing the classes
Clustering in practice using FactoMineR
- Data - issues
Balancing groups and choosing a weighting for the variables
Studying and visualizing the groups of variables
Visualizing the partial points
Visualizing the separate analyses
Taking into account groups of categorical variables
Taking into account contingency tables
Interpretation aids
Multiple Factor Analysis in practice using FactoMineR
Course runs
Archived
- From March 2, 2015 to April 7, 2015
- From March 4, 2019 to May 14, 2019
- From March 2, 2020 to July 9, 2021
- From March 3, 2021 to May 17, 2021
- From March 3, 2021 to May 11, 2021
- From March 7, 2022 to May 11, 2022
- From March 7, 2022 to May 12, 2022
Course team
François Husson
Magalie Houée-Bigot
Organizations
License
License for the course content
Attribution-NonCommercial-NoDerivatives
You are free to:
- Share — copy and redistribute the material in any medium or format
Under the following terms:
- Attribution — You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.
- NonCommercial — You may not use the material for commercial purposes.
- NoDerivatives — If you remix, transform, or build upon the material, you may not distribute the modified material.
License for the content created by course participants
All rights reserved
"All rights reserved" is a copyright formality indicating that the copyright holder reserves, or holds for its own use, all the rights provided by copyright law.