Learning Objectives:

- Basic matrix operations. You should be able to interpret all of these geometrically AND write down generic formulas for each.
- Vector addition, subtraction, and scaling.
- Matrix-vector multiplication, and interpretations in terms of the rows and columns of the matrix.
- Matrix-matrix multiplication, and interpretations.
- Dot products, including how they relate to projections and the angle between vectors. More generally, inner products.
- Projection onto a vector or onto a subspace.
- Matrix transposes. Transpose of products, transpose of sums.

- Basic Matrix Properties
- Understand the rank of a matrix both algebraically and geometrically. What does it mean for a matrix to have full rank?
- Be able to describe the image and kernel of a matrix using set notation.
- Know all of the common equivalent conditions for a matrix to be invertible, in terms of the eigenvalues, the columns, the determinant, etc..

- Vector spaces
- Definition of an abstract vector space / subspace.
- Definition of linearly independent vectors
- Definition of basis, span, and dimension
- Orthogonality and orthogonal / orthonormal bases
- Know how to compute the coordinate vector for an element of an abstract vector space, with respect to a known basis.
- Gram-Schmidt orthogonalization
- Know what a linear transformation is.

- Norms
- Definition of a norm, and the properties that every norm must satisfy.
- Understand and interpret geometrically the 1-norm, 2-norm, and infinity-norms. More generally, the p-norms.

- Special matrices. Know the properties and common use cases for each. For example, what can we say about the eigenvalues/eigenvectors for each special matrix below? How do we multiply these matrices or take powers? What can we say about the rank?
- Symmetric / Hermitian matrices.
- Orthogonal / unitary matrices
- Diagonal matrices
- Triangular matrix
- Positive-definite and positive-semidefinite matrices

- Eigenvalues / eigenvectors
- Geometric interpretation
- Matrix diagonalization. Which matrices are diagonalizable and how can you tell?
- Spectral theorem

- Singular value decomposition
- Geometric interpretation (the image of the unit circle under every matrix is a hyperellipse)
- What does the SVD look like for the special matrices listed above?
- How can we interpret the singular values?
- How does the SVD relate to the rank of a matrix?

Helpful Resources:

- UNSW MATH 2501 Linear Algebra Notes on Canvas. These are very well-written with lots of examples. You should be comfortable with the material from the first 7-8 chapters.
**Book:**"Linear Algebra Done Right" by Sheldon Axler**Notes:**"Matrix Differentiation" by RJ Barnes**CS229:**Linear Algebra Review and Reference

Learning Objectives:

- Essential concepts in probability
- You should be able to work with both discrete distributions AND continuous distributions. Understand pdfs and cdfs.
- What is the difference between a random variable and a distribution?
- Independent random variables and events.
- Common distributions and their properties (mean, variance, etc.)
- Bernoulli
- Binomial
- Multinomial
- Poisson
- Categorical
- Gaussian / Normal / Multivariate Normal
- Beta
- Dirichlet

- Random vectors. (Vectors whose elements are random variables)
- Expectation, variance, and covariance of random variables.
- Conditional expectation and variance.
- Functions of random variables (compute expectation of functions of random variables, etc.)
- Joint distributions. Marginalization. Marginal distributions.
- Bayes' theorem.
- Law of Total Probability
- You should be comfortable with conditional probability.
- Infinite sequences of random variables.
- Basic understanding of likelihood

- Convex functions: definition and properties
- Constrained vs. unconstrained optimization
- Basic understanding of Lagrange duality

Helpful Resources:

**CS229:**Review of Probability Theory**CS229:**Convex Optimization Overview**CS229:**Convex Optimization Overview Part II

Learning Objectives:

- Supervised learning regression problem
- Understand the assumptions made by the model of linear regression
- Understand the difference between input $x$ and features $\phi(x)$
- Intuitively understand the least squares objective function
- Use gradient descent to minimize the objective function
- Use matrix algebra to compute the gradient of the objective
- Pros and cons of batch vs. stochastic gradient descent

- Use the normal equations to find the least squares solution
- Know how to derive the pseudoinverse
- Understand the normal equations geometrically

Helpful Resources:

**Murphy, §7.1-7.3:**Linear Regression**Bishop, §1.1:**Polynomial Curve Fitting Example**Bishop, §3.1:**Linear Basis Function Models**CS229, Lecture Notes #1:**Part I, Linear Regression

Advanced Reading:

**Blog Post:**Moritz Hardt, "The Zen of Gradient Descent"

Learning Objectives:

- Overfitting and the need for regularization
- Write the objective function for lasso and ridge regression
- Use matrix calculus to find the gradient of the regularized objective

- Understand the probabilistic interpretation of linear regression
- MLE formulation of linear regression objective
- MAP formulation of regularized linear regression
- Different priors correspond to different regularization

- Understand probabilistic modeling at a high level
- Goals and assumptions of maximum likelihood estimation
- Goals and assumptions of MAP estimation
- Priors, likelihoods, and posteriors

Helpful Resources:

**Murphy, §7.5:**Ridge Regression

Advanced Reading, only if you're interested:
* Murphy, §13.3: $\ell_1$ Regularization Basics
*

Learning Objectives:

- Undersand the supervised learning classification problem formulation
- Understand the probabilistic interpretation of logistic regression
- Know why we use the sigmoid function rather than a hard threshold
- Write down the likelihood function
- Be able to take the gradient of the negative log-likelihood objective

- Use Newton's method to find the maximum likelihood parameter estimate
- Understand why Newton's method applies here
- Understand the difference between gradient descent and Newton's method
- Understand Newton's method geometrically for one-dimensional problems

- Know that there is no closed-form solution for logistic regression
- Understand logistic regression as a linear classifier
- Know how logistic regression can be generalized to softmax regression for multiclass problems

Helpful Resources:

**Murphy, §8.1-8.3**Logistic Regression**Bishop, §4.2**: Probabilistic Generative Models**Bishop, §4.3**: Probabilistic Discriminative Models**CS229, Lecture Notes #1:**Part II, Classification & Logistic Regression**CS229, Supplemental Notes #1:**Binary Classification & Logistic Regression

Learning Objectives:

- Understand the difference between generative and discriminative classifiers
- Know which models we've used belong to which category

- Naive Bayes Classifiers
- Understand the conditional independence assumption and implications
- Know how to write down the likelihood function
- Be able to compute MLE/MAP estimates by hand
- Understand Laplace smoothing and the problem it solves

- Compare Logistic Regression and Naive Bayes

Helpful Resources:

**Murphy, §3.1-3.4:**Generative Models for Discrete Data**Murphy, §3.5:**Naive Bayes Classifiers**Murphy, §8.6:**Generative vs. Discriminative Classifiers**CS229, Lecture Notes #2:**Part IV, Generative Learning Algorithms

Advanced Reading, only if you're interested:

**Paper:**Ng & Jordan 2001, On Discriminative vs. Generative Classifiers: A Comparison of Logistic Regression and Naive Bayes**Paper:**Zhang, H., 2004. "The optimality of naive Bayes". AA, 1(2), p.3.**Paper:**Domingos, P. and Pazzani, M., 1997. "On the optimality of the simple Bayesian classifier under zero-one loss". Machine learning, 29(2-3), pp.103-130.

Learning Objectives:

- Find a linear classifier
- Maximize the margin
- Get an optimization problem with constraints
- Variables have some semantics
- No corresponding probabilistic model

Helpful Resources:

**Murphy, §14.5:**Support Vector Machines**CS229, Lecture Notes #3:**Part V, Support Vector Machines

Learning Objectives:

- Optimal soft-margin hyperplane objective

Learning Objectives:

- Bias vs Variance
- Visualization of Bias vs Variance
- Model complexity

Helpful Resources:

**Bishop, §3.2:**The Bias-Variance Decomposition

Learning Objectives:

- Entropy and mutual information

Helpful Resources:

**Murphy, §2.8:**Information Theory**Murphy, §16.1-16.2:**Classification and Regression Trees**Bishop, §1.6:**Information Theory**Blog Post:**Aldo Cortesi, "Visualizing Entropy in Binary Files"