Descriptif
The goal of Data Science is to improve decision making by basing decisions on insights extracted from large data sets. The emergence of Big Data and social media as well as the speedup in computing power has allowed the development of powerful methods for data analysis and modelling.
Today, data science drives decision making in nearly all parts of society, from email filtering to product recommendations and personalized advertising. In this golden age of data, it is then necessary to understand the core elements and mechanisms behind Data Science processes and tools.
Data Science is a fast and dynamic growing field at the interface of Computer Science, Mathematics, Statistics and domain knowledge. The Data Science course aims to provide students with general knowledge on concepts, algorithms, techniques and tools of the field at a depth that provides a principled understanding of the field.
The students will learn to conduct a Data Science project. They will learn to create models from algorithms and test them, using real or fictitious/realistic data. They will also learn to evaluate the obtained models. Various computer tools will be used to obtain the models. On the other hand, students will gain understanding on the Data Science practice by working on a data science project.
Objectifs pédagogiques
- Understand Data Science project lifecycle;
- Understand the basics of supervised and unsupervised Machine Learning techniques;
- Be able to accurately formulate the problem and prepare data;
- Be able to build a model using the most suitable algorithm and estimate the parameters;
- Be able to evaluate and fit the model;
- Analyze and explain the obtained results;
- Understand the ethical challenges;
- Get the basics of Python.
- Travail en Autonomie programmé à l'EDT : 15
- Cours Magistraux : 15
effectifs minimal / maximal:
15/40Diplôme(s) concerné(s)
UE de rattachement
- UE-M-MILES-ENM : Enterprise & Network Management
Pour les élèves du diplômeManagement of International Lean and Supply chain projects
N/A
Format des notes
Numérique sur 20Pour les élèves du diplômeManagement of International Lean and Supply chain projects
Vos modalités d'acquisition :
Case studies
Individual Exam
In-class presentation
Le rattrapage est autorisé (Max entre les deux notes écrêté à une note seuil)Le coefficient de l'enseignement est : 3
Programme détaillé
Session 1: SFtF 1 (3hours): During this SFtF session, an introduction to Data Science is presented. The focus will be set on what are data, what is a dataset, how to manage a Data Science project (CRISP-DM, TDSP) and the Data Science ecosystem and a reminder regarding statistics and probability is presented.
Session 2: SNFtF 1 (3 hours): An individual assessment will open this session on the fundamentals of Statistics and Probability. Students are then required to perform a brief tutorial on Python which will be used in the following face-to-face and non face-to-face sessions. The main objective of this preliminary work is that the students become familiar with Python and to guarantee a homogeneous knowledge basis regarding programming while assess their skills on statistics and probability.
Session 3: SFtF 2 (3hours): In the first part of this SFtF session the big picture for Machine Learning is presented. Then, the focus will be set on supervised Machine Learning methods. First, regression methods, including Time Series Analysis methods, will be detailed. This session continues with a focus set on classification methods. Model evaluation will also be presented for both methods. For some of the presented supervised Machine Learning methods, test data will be provided to try the presented techniques (“learn by doing”).
Session 4: SNFtF 2 (3 hours): During this SNFtF session, the students will continue practicing by creating models using regression and/or classification methods. Students will have to identify the problem, choose the most suitable algorithm, choose the parameters and build the model and finally discuss the obtained solution. They will work individually and then compare their results by group of 3 persons. They will have to present and discuss their results in the next SFTF session (5 minutes per group).
Session 5: SFtF 3 (3 hours): At the beginning of this SFtF session, the students will briefly present their results and comments from the SNFtF session 2. Then, unsupervised machine learning methods will be presented, as well as model evaluation for these methods. Test data will be provided to try the presented techniques (“learn by doing”). The topic of Deep Learning will be introduced during this session.
Session 6: SNFtF 3 (3 hours): During this SNFtF session, the students will continue practicing by creating models using clustering methods. Students will have to identify the problem, choose the most suitable algorithm, choose the parameters and build the model and finally discuss the obtained solution. They will work individually and then compare their results by group of 3 persons. They will have to present and discuss their results in the next SFTF session (5 minutes per group).
Session 7: SFtF 4 (3 hours): At the beginning of this SFtF session, the students will briefly present their results and comments from the SNFtF session 3. Preparing and cleaning data prior to modelling a data problem is an important topic that will be introduced during this session. Principles of Data Vizualization will also be presented. Finally, ethical challenges regarding the use of data will introduced prior to the preparation of a debate among students.
Session 8: SNFtF 4 (3 hours): During this SNFtF session, the students will practice Data Vizualization by applying various solutions (individually) and discussing the best one (in groups of 3) on several provided datasets. Then, they will be asked to prepare a debate on Data Science ethics in group of 10. They have to choose a relevant question regarding Data Science ethics related to industry. Half the group will prepare the pros while the other half will prepare the cons regarding this question. It is advised to not share ones arguments with the opponent side.
Session 9: SFtF 5 (3 hours): At the beginning of this 3-hour SFtF session, the students will debate by group of 10, using the arguments prepared during SNFtF session 4. At the end of the debates, they will create groups of 3 persons for facing the case/problem. It is recommended that the groups be constituted by students with different background, not putting together the students with good background in computer science, statistics, but dispersing them among the groups in order to facilitate the learning of all students.
Session 10: SNFtF 5 (3 hours): In this 3-hour SNFtF session, each group will continue working on their data science problem, following the steps of the CRISP-DM process. Once the model is correct, the group will start to check the validity of the model and analyze the results. At the end of this session, it is expected that the group achieve the result of the model with the provided data.