-
Presentation
Presentation
The main objective of this Curricular Unit is to provide students with the necessary skills and tools for exploratory and descriptive data analysis. We expect the student to gain a basic understanding of their data and the relationships between variables.
-
Class from course
Class from course
-
Degree | Semesters | ECTS
Degree | Semesters | ECTS
Bachelor | Semestral | 6
-
Year | Nature | Language
Year | Nature | Language
1 | Mandatory | Português
-
Code
Code
ULHT6638-1790
-
Prerequisites and corequisites
Prerequisites and corequisites
Not applicable
-
Professional Internship
Professional Internship
Não
-
Syllabus
Syllabus
1. Introduction to Exploratory Data Analysis and understanding its importance in Data Science;
2. Characterization of the different types of data and their nature;
3. Descriptive Statistics:
- Frequency tables;
- Descriptive Statistics: measures of location, dispersion, asymmetry, and kurtosis;
- Graphic views;
4. Bivariate Analysis:
- Pearson's linear efficiency coefficient;
- Linear regression;
- Independence between variables;
5. Graphical representations;- Techniques and best practices;
- Automated Exploratory Data Analysis;
6. Python for exploratory data analysis:
- Introduction to NumPy and Pandas libraries for data cleaning and analysis;
- Introduction to Matplotlib, Seaborn, Plotly and Bokeh libraries for data visualization;
- Tools for automated Exploratory Data Analysis (EDA).
-
Objectives
Objectives
At the end of this course unit, we expected that the student:
- Understand what Exploratory Data Analysis is and how it fits into the Data Science workflow;
- Understand the nature of different types of data and the need to process them;
- Understand and apply descriptive statistics in data analysis;
- Perform bivariate analyses;
- Organize and synthesize the data to obtain the necessary information to answer the questions that are being discussed;
- Create objective and effective visualizations of data that result in concrete actions;
- Use Python language and its libraries for data analysis.
-
Teaching methodologies and assessment
Teaching methodologies and assessment
Theoretical concepts are introduced in class, and then they are complemented with real-world examples. For each topic, the students are given a set of exercises that aim to apply the theoretical concepts. Exercises are discussed and solved in class, students are invited to share any doubts they might have.
Support materials and exercises with resolution suggestions will be available on Moodle.
It is believed that continuous assessment, adapted according to the evolution of students, is a good practice. Individual monitoring and availability to clarify doubts, whenever necessary, is essential for the student and his/her performance.
-
References
References
- VanderPlas, J. (2016). Python Data Science Handbook: Essential Tools for Working with Data,. O'ReiIIy.
- Data Visualization with Python for Beginners: Visualize Your Data using Pandas, Matplotlib and Seaborn (2020). AI Publishing LLC.
- Grus, J. (2019). Dafa science from scratch: first principles with Python. O'ReiIIy Media.
- Murteira, B. & Antunes, M. (2012). Probabilidades e Estatística. (Vol.1). Lisboa: Escolar Editora.
-
Office Hours
Office Hours
-
Mobility
Mobility
No