filmeu

Class Exploratory Data Analysis

  • Presentation

    Presentation

    The main objective of this Curricular Unit is to provide students with the necessary skills and tools for exploratory and descriptive data analysis. We expect the student to gain a basic understanding of their data and the relationships between variables.

  • Code

    Code

    ULHT6634-1790
  • Syllabus

    Syllabus

    1. Introduction to Exploratory Data Analysis and understanding its importance in Data Science;

     

    2. Characterization of the different types of data and their nature;

     

    3. Descriptive Statistics:

    • Frequency tables;
    • Descriptive Statistics: measures of location, dispersion, asymmetry, and kurtosis;
    • Graphic views;

     

    4. Bivariate Analysis:

    • Pearson's linear efficiency coefficient;
    • Linear regression;
    • Independence between variables;


    5. Graphical representations;

    • Techniques and best practices;
    • Automated Exploratory Data Analysis;

     

    6. Python for exploratory data analysis:

    • Introduction to NumPy and Pandas libraries for data cleaning and analysis;
    • Introduction to Matplotlib, Seaborn, Plotly and Bokeh libraries for data visualization;
    • Tools for automated Exploratory Data Analysis (EDA).
  • Objectives

    Objectives

    At the end of this course unit, we expected that the student:

     

    • Understand what Exploratory Data Analysis is and how it fits into the Data Science workflow;
    • Understand the nature of different types of data and the need to process them;
    • Understand and apply descriptive statistics in data analysis;
    • Perform bivariate analyses;
    • Organize and synthesize the data to obtain the necessary information to answer the questions that are being discussed;
    • Create objective and effective visualizations of data that result in concrete actions;
    • Use Python language and its libraries for data analysis.
  • Teaching methodologies and assessment

    Teaching methodologies and assessment

    Theoretical concepts are introduced in class, and then they are complemented with real-world examples. For each topic, the students are given a set of exercises that aim to apply the theoretical concepts. Exercises are discussed and solved in class, students are invited to share any doubts they might have.

    Support materials and exercises with resolution suggestions will be available on Moodle.

    It is believed that continuous assessment, adapted according to the evolution of students, is a good practice. Individual monitoring and availability to clarify doubts, whenever necessary, is essential for the student and his/her performance.

  • References

    References

    • VanderPlas, J. (2016). Python Data Science Handbook: Essential Tools for Working with Data,. O'ReiIIy.
    • Data Visualization with Python for Beginners: Visualize Your Data using Pandas, Matplotlib and Seaborn (2020). AI Publishing LLC.
    • Grus, J. (2019). Dafa science from scratch: first principles with Python. O'ReiIIy Media.
    • Murteira, B. & Antunes, M. (2012). Probabilidades e Estatística. (Vol.1). Lisboa: Escolar Editora.
SINGLE REGISTRATION
Lisboa 2020 Portugal 2020 Small Logo EU small Logo PRR republica 150x50 Logo UE Financed Provedor do Estudante Livro de reclamaões Elogios