Exploratory Multivariate Data Analysis
- Duration: 5 weeks
- Effort: 25 hours
- Pace: Self paced
What you will learn
At the end of this course, you will be able to:
- how to summarise and synthesise datasets using simple graphs
- how to use visualization methods adapted to multidimensional exploratory analysis
- how to interpret the results of a factor analysis and a classification;
- how to ecognise the method adapted to the exploration of a dataset according to the nature and structure of the variables;
- how to analyse the responses to a survey;
- how to perform a textmining
- how to implement factorial and classification methods on the free software R
In summary, you will be able to implement and interpret multidimensional exploratory analyses.
Exploratory multivariate data analysis is studied and teached in a French-way since a long time in France. This course focuses on four essential and basic methods, those with the largest potential in terms of applications: principal component analysis (PCA) when variables are quantitative, correspondence analysis (CA) and multiple correspondence analysis (MCA) when variables are categorical and clustering. An extension to Multiple Factor Analysis (MFA) will give you the opportunity to analyse more complex dataset that are structured by groups.
We hope that with this course, the participant will be fully equipped (theory, examples, software) to confront multivariate real-life data.
This course is application-oriented; formalism and mathematics writing have been reduced as much as possible while examples and intuition have been emphasized and the numerous exercises done with FactoMineR (a package of the free R software) will make the participant efficient and reliable face to data analysis.
This course will be held in English. It has been designed for scientists whose aim is not to become statisticians but who feel the need to analyze the data themselves. It is therefore addressed to practitioners who are confronted with the analysis of data in marketing, surveys, ecology, biology, geography, etc.
An undergraduate level is quite sufficient to capture all the concepts introduced.
Basic knowledges in statistics are necessary, such as: correlation coefficient, chi-squared test, one-way ANOVA.
On the sofware side, an introduction to the R language is sufficient, at least at first.
Assessment and certification
Participants will focus on one theme per week and will have the opportunity to evaluate their learning progress via a weekly quiz. Each course sequence, will be completed by a series of small quizzes and exercises. You will do your exercises directly in your web browser, and the correctness of your answer will be automatically assessed by the system.
At the end of the course, you will have to complete a final evaluation and participants who have more than 50% of correct answer in quizzes and exercises will receive a certificate of attendance
- Data - Practicalities
Studying individuals and variables
Aids for interpretation
PCA in practice using FactoMineR
- Data - introduction and independence model
Visualizing the row and column clouds
Inertia and percentage of inertia
Correspondance Analysis in practice using FactoMineR
- Data - issues
Visualizing the point cloud of individuals
Visualizing the point cloud of categories - simultaneous representation
Multiple Correspondance Analysis in practice using FactoMineR
- Hierarchical clustering
An example, and choosing the number of classes
Partitioning methods and other details
Characterizing the classes
Clustering in practice using FactoMineR
- Data - issues
Balancing groups and choosing a weighting for the variables
Studying and visualizing the groups of variables
Visualizing the partial points
Visualizing the separate analyses
Taking into account groups of categorical variables
Taking into account contingency tables
Multiple Factor Analysis in practice using FactoMineR
Other course runs
- From Jan. 7, 2022 to April 29, 2022
- From March 7, 2022 to May 11, 2022
- From March 3, 2021 to May 11, 2021
- From March 3, 2021 to May 17, 2021
- From March 4, 2019 to May 14, 2019
License for the course content
You are free to:
- Share — copy and redistribute the material in any medium or format
Under the following terms:
- Attribution — You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.
- NonCommercial — You may not use the material for commercial purposes.
- NoDerivatives — If you remix, transform, or build upon the material, you may not distribute the modified material.