filmeu

Class Advanced Data Science

  • Presentation

    Presentation

    'Advanced Data Science' develops a rigorous, end-to-end analytical practice using (mainly) the Ames Housing Dataset as a sustained case study. The module takes students from data characterisation - variable typing, missing-data diagnosis, and quality assessment - through association analysis and collinearity detection, to the construction and evaluation of predictive models. A central portion covers dimensionality reduction (SVD, PCA, and factor models), emphasising correct application within modelling pipelines and interpretation as latent-variable hypotheses. The final weeks address decision-oriented model evaluation, interpretability, and algorithmic fairness. Reproducibility is a cross-cutting concern: students work in Python notebooks throughout, culminating in an integrated reproducible report subject to peer review.
  • Code

    Code

    ULHT6347-23554
  • Syllabus

    Syllabus

    Advanced predictive modelling: regression and classification models with complex datasets and error analysis. Treatment and interpretation of missing data: patterns of absence (MCAR, MAR, MNAR), application of conditional and justified imputation strategies. Data typing and structure: analytical definition of variable types, creation of data dictionaries, and understanding the relationship between encoding and statistical meaning. Feature engineering and dimensionality reduction: application of SVD, PCA, and factor analysis to generate new structured representations. Performance metrics and model explainability (XAI): evaluation with robust metrics (MAE, RMSE, PR-AUC), permutation importance, and SHAP. Algorithmic fairness and bias assessment: fairness (demographic parity, equalized odds), analysis and communication of biases in predictive models. Reproducibility and professional best practices: reproducible projects and reports.  
  • Objectives

    Objectives

    Knowledge: Students will deepen their understanding of data characterisation and diagnosis techniques, predictive modelling (regression and classification), dimensionality reduction (SVD, PCA, and factor models), and model explainability. Skills: Students will be able to conduct rigorous missing-data analysis, build and evaluate predictive models, apply dimensionality reduction within modelling pipelines, and critically interpret results in light of their epistemological limitations. Competencies: Students will develop the ability to conduct a reproducible, end-to-end data science project, from data characterisation through to communication of results, integrating algorithmic fairness assessment and peer review. They will be prepared to reason critically about the meaning, limitations, and ethical implications of the models they build.
  • Teaching methodologies

    Teaching methodologies

    Interactive peer review: collaborative feedback among students to enhance mutual learning. Immersion sessions: practical sessions that complement theoretical classes. Project-based learning: development of projects covering the entire data science lifecycle. Python notebooks: primary tool for interactive coding and reproducible documentation. Vibecoding (AI-assisted coding from natural-language prompts): applying generative AI tools to enrich practical components and enable rapid iteration on analytical solutions.
  • References

    References

    Gelman, A., Hill, J., & Vehtari, A. (2020). Regression and Other Stories. Cambridge University Press. Little, R. J. A., & Rubin, D. B. (2002). Statistical Analysis with Missing Data (2nd ed.). Wiley. Note. Students are not expected to buy any books for this module.  
  • Assessment

    Assessment

     

    Descrição

    Data limite

    Ponderação

    Final Project

    08-06-2026

    50%

    Theory Test

    18-05-2026

    35%

    Ames Project

    06-03-2026

    15%

     

    Use of generative AI is permitted only in accordance with specific rules to be discussed at the start of and during the semester.

SINGLE REGISTRATION
Lisboa 2020 Portugal 2020 Small financiado eu 2024 prr 2024 republica portuguesa 2024 Logo UE Financed Provedor do Estudante Livro de reclamaões Elogios entidade signataria