# Computational Statistics and Data Analysis (MVComp2)

Summer Semester 2024. April 15th, 2023 to July 19, 2024

**Lectures**: Wed 11-13; **Exercises**: Wed 14-16

**Lectures**: INF 227 01.403/404; **Exercises**: INF 227 01.403/404

Lecturers:

- Prof. Dr. Tristan Bereau, Institute for Theoretical Physics, Heidelberg University
- Prof. Dr. Daniel Dursteweitz, Zentralinstitut für Seelische Gesundheit, Mannheim

Tutors: Luis Walter, Florian Hess, Alena Braendle, Max Ingo Thurm

6 credit points

## Course links

You will find the course content on Moodle.

## Course description

This lecture will introduce basic methods and approaches in computational statistics and data analysis, of great importance to empirical problems in the natural sciences. An overview of relevant concepts and theorems in probability theorey and statistics will be covered, all the way to more modern approaches, including automatic differentiation and machine learning. Lectures will be accompanied by computational exercises in Python. Students will learn to analyze data sets and interpret the results from a solid, thoeretically grounded statistical perspective; devise statistical and machine learning models of experimental situations; infer the parameters of these models from empirical observations; and test hypotheses.

## Prerequisites

- Linear (Matrix) Algebra
- Basic calculus (derivatives & integrals)
- Basic programming skills in Python

## Tentative course outline

- Basic concepts in probability theory
- Random variables; expectations, variances, covariances, and their properties
- Discrete & continuous probability distributions
- Moment-generating functions, central limit theorem, and multivariate distributions
- Statistical models & inference: parameter estimation
- Hypothesis tests: tests, confidence intervals, bootstrap method
- Linear regression: least squares, generalized linear model
- Regularization: Ridge & LASSO regression, MAP estimation
- Nonlinear regression: basis expansions, neural networks
- Classification: k-nearest neighbors, logistic regression, linear discriminant analysis
- Kernel methods: Mercer kernels, Gaussian processes, support vector machines
- Model selection: Jeffreys scale, BIC, bias-variance tradeoff
- Dimensionality reduction: principal component analysis, factor analysis
- Information theory

## Main references

- Wackerly, D., Mendenhall, W., & Scheaffer, R. L. (2014). Mathematical statistics with applications. Cengage Learning.
- Kevin P. Murphy, Probabilistic Machine Learning: An Introduction, MIT Press (2022), https://probml.github.io/pml-book/book1.html
- Kevin P. Murphy, Probabilistic Machine Learning: Advanced Topics, MIT Press (2022), https://probml.github.io/pml-book/book2.html
- Mehta, P., Bukov, M., Wang, C. H., Day, A. G., Richardson, C., Fisher, C. K., & Schwab, D. J. (2019). A high-bias, low-variance introduction to machine learning for physicists. Physics reports, 810, 1-124. https://doi.org/10.1016/j.physrep.2019.03.001
- Luca Amendola, Lecture notes on Statistical Methods. https://www.thphys.uni-heidelberg.de/%7Eamendola/teaching/compstat-hd.pdf