
Class Introduction to Data Science

  • Presentation


    The Introduction to Data Science course aims to provide the student with essential skills in data analysis, in a multidisciplinary perspective, as is Data Science. Through the presentation of methodologies and fundamental techniques to treat, transform, construct and analyze data, the objective of this curricular unit is to give the student the ability to translate this analysis into knowledge and value in a sustainable way for decision making.
    The practical component is one of the fundamental aspects of the discipline, so the ability to translate knowledge into practical actions and analysis decisions is particularly valued. The close connection with the business world to answer business questions will be illustraded in this course.

  • Code


  • Syllabus


    1. Introduction to Data Science:

    • Importance and applications of Data Science
    • Project Workflow: Practical Examples
    • Data types: Structured, Semi-structured, and unstructured
    • Challenges in Data Science


    2. Python for Data Science

    • Setup: Jupyter notebook
    • NumPy
    • Pandas
    • ydata-profiling

    3. Data pre-processing

    • Cleaning and preparing structured data (data wrangling: slicing, grouby, pivoting, missing values, imputation, duplicates, outliers, etc.)
    • Unstructured data processing - Text (lemmatization, stemming, etc.)

    4. Introduction to Machine Learning, supervised and unsupervised models

    • Basic concepts
    • Linear Regression
    • Logistic Regression
    • Dimensionality Reduction (PCA)


    5. Micro-Services and APIs Concepts

    • API Definition
    • APIs Design
    • Implementing APIs in Python
    • Operating APIs in a prediction process 
  • Objectives


    The course aims to give the student the skills to:

    • LG1. Understand the importance of Data Science in the real world
    • LG2. Understand the nature of data
    • LG3. Understand the main techniques and methods in Python programming used by data scientists, through their practice
    • LG4. Be able to perform basic data preparation and pre-processing tasks
    • LG5. Do exploratory data analysis with Python implementation
    • LG6. Understanding a data scientist's workflow and being able to think about solving problems with data
    • LG7. Understand and implement machine learning methods, supervised and unsupervised
    • LG10. Know the performance metrics of a model
    • LG9. Understand the concepts of API and design micro-services in a data analysis context
    • LG10. Be able to implement APIs to support machine learning methods in Python
  • Teaching methodologies and assessment

    Teaching methodologies and assessment

    Theoretical concepts are introduced in class, and then they are complemented with real-world examples. For each topic, the students are given a set of exercises that aim to apply the theoretical concepts. Exercises are discussed and solved in class, students are invited to share any doubts they might have.


    Support materials and exercises with resolution suggestions will be available on Moodle.

    It is believed that continuous assessment, adapted according to the evolution of students, is a good practice. Individual monitoring and availability to clarify doubts, whenever necessary, is essential for the student and his/her performance.

  • References


    • Grus, J. (2019). Dafa science from scratch: first princples with Python . O'ReiIIy Media.
Lisboa 2020 Portugal 2020 Small Logo EU small Logo PRR republica 150x50 Logo UE Financed Provedor do Estudante Livro de reclamaões Elogios