Lecture 1, Introduction to Machine Learning, 2016-09-07 00:00:00-04:00

Lecture 02, Linear Algebra Review, 2016-09-12 00:00:00-04:00

Learning Objectives:

Basic matrix operations. You should be able to interpret all of these geometrically AND write down generic formulas for each.
- Vector addition, subtraction, and scaling.
- Matrix-vector multiplication, and interpretations in terms of the rows and columns of the matrix.
- Matrix-matrix multiplication, and interpretations.
- Dot products, including how they relate to projections and the angle between vectors. More generally, inner products.
- Projection onto a vector or onto a subspace.
- Matrix transposes. Transpose of products, transpose of sums.
Basic Matrix Properties
- Understand the rank of a matrix both algebraically and geometrically. What does it mean for a matrix to have full rank?
- Be able to describe the image and kernel of a matrix using set notation.
- Know all of the common equivalent conditions for a matrix to be invertible, in terms of the eigenvalues, the columns, the determinant, etc..
Vector spaces
- Definition of an abstract vector space / subspace.
- Definition of linearly independent vectors
- Definition of basis, span, and dimension
- Orthogonality and orthogonal / orthonormal bases
- Know how to compute the coordinate vector for an element of an abstract vector space, with respect to a known basis.
- Gram-Schmidt orthogonalization
- Know what a linear transformation is.
Norms
- Definition of a norm, and the properties that every norm must satisfy.
- Understand and interpret geometrically the 1-norm, 2-norm, and infinity-norms. More generally, the p-norms.
Special matrices. Know the properties and common use cases for each. For example, what can we say about the eigenvalues/eigenvectors for each special matrix below? How do we multiply these matrices or take powers? What can we say about the rank?
- Symmetric / Hermitian matrices.
- Orthogonal / unitary matrices
- Diagonal matrices
- Triangular matrix
- Positive-definite and positive-semidefinite matrices
Eigenvalues / eigenvectors
- Geometric interpretation
- Matrix diagonalization. Which matrices are diagonalizable and how can you tell?
- Spectral theorem
Singular value decomposition
- Geometric interpretation (the image of the unit circle under every matrix is a hyperellipse)
- What does the SVD look like for the special matrices listed above?
- How can we interpret the singular values?
- How does the SVD relate to the rank of a matrix?

Helpful Resources:

UNSW MATH 2501 Linear Algebra Notes on Canvas. These are very well-written with lots of examples. You should be comfortable with the material from the first 7-8 chapters.
Book: "Linear Algebra Done Right" by Sheldon Axler
Notes: "Matrix Differentiation" by RJ Barnes
CS229: Linear Algebra Review and Reference

Lecture 03, Probability Review & Intro to Optimization, 2016-09-14 00:00:00-04:00

Learning Objectives:

Essential concepts in probability
- You should be able to work with both discrete distributions AND continuous distributions. Understand pdfs and cdfs.
- What is the difference between a random variable and a distribution?
- Independent random variables and events.
- Common distributions and their properties (mean, variance, etc.)
  - Bernoulli
  - Binomial
  - Multinomial
  - Poisson
  - Categorical
  - Gaussian / Normal / Multivariate Normal
  - Beta
  - Dirichlet
- Random vectors. (Vectors whose elements are random variables)
- Expectation, variance, and covariance of random variables.
- Conditional expectation and variance.
- Functions of random variables (compute expectation of functions of random variables, etc.)
- Joint distributions. Marginalization. Marginal distributions.
- Bayes' theorem.
- Law of Total Probability
- You should be comfortable with conditional probability.
- Infinite sequences of random variables.
- Basic understanding of likelihood
Convex functions: definition and properties
Constrained vs. unconstrained optimization
Basic understanding of Lagrange duality

Helpful Resources:

CS229: Review of Probability Theory
CS229: Convex Optimization Overview
CS229: Convex Optimization Overview Part II

Lecture 4. Linear Regression, Part I, 2016-09-19 00:00:00-04:00

Learning Objectives:

Supervised learning regression problem
Understand the assumptions made by the model of linear regression
Understand the difference between input $x$ and features $\phi(x)$
Intuitively understand the least squares objective function
Use gradient descent to minimize the objective function
- Use matrix algebra to compute the gradient of the objective
- Pros and cons of batch vs. stochastic gradient descent
Use the normal equations to find the least squares solution
- Know how to derive the pseudoinverse
- Understand the normal equations geometrically

Helpful Resources:

Murphy, §7.1-7.3: Linear Regression
Bishop, §1.1: Polynomial Curve Fitting Example
Bishop, §3.1: Linear Basis Function Models
CS229, Lecture Notes #1: Part I, Linear Regression

Advanced Reading:

Blog Post: Moritz Hardt, "The Zen of Gradient Descent"

Lecture 5. Linear Regression, Part II, 2016-09-21 00:00:00-04:00

Learning Objectives:

Overfitting and the need for regularization
- Write the objective function for lasso and ridge regression
- Use matrix calculus to find the gradient of the regularized objective
Understand the probabilistic interpretation of linear regression
- MLE formulation of linear regression objective
- MAP formulation of regularized linear regression
  - Different priors correspond to different regularization
Understand probabilistic modeling at a high level
- Goals and assumptions of maximum likelihood estimation
- Goals and assumptions of MAP estimation
- Priors, likelihoods, and posteriors

Helpful Resources:

Murphy, §7.5: Ridge Regression

Advanced Reading, only if you're interested: Murphy, §13.3: $\ell_1$ Regularization Basics Murphy, §13.4: $\ell_1$ Regularization Algorithms

Lecture 6. Logistic Regression, 2016-09-26 00:00:00-04:00

Learning Objectives:

Undersand the supervised learning classification problem formulation
Understand the probabilistic interpretation of logistic regression
- Know why we use the sigmoid function rather than a hard threshold
- Write down the likelihood function
- Be able to take the gradient of the negative log-likelihood objective
Use Newton's method to find the maximum likelihood parameter estimate
- Understand why Newton's method applies here
- Understand the difference between gradient descent and Newton's method
- Understand Newton's method geometrically for one-dimensional problems
Know that there is no closed-form solution for logistic regression
Understand logistic regression as a linear classifier
Know how logistic regression can be generalized to softmax regression for multiclass problems

Helpful Resources:

Murphy, §8.1-8.3 Logistic Regression
Bishop, §4.2: Probabilistic Generative Models
Bishop, §4.3: Probabilistic Discriminative Models
CS229, Lecture Notes #1: Part II, Classification & Logistic Regression
CS229, Supplemental Notes #1: Binary Classification & Logistic Regression

Lecture 7. Naive Bayes, 2016-09-28 00:00:00-04:00

Learning Objectives:

Understand the difference between generative and discriminative classifiers
- Know which models we've used belong to which category
Naive Bayes Classifiers
- Understand the conditional independence assumption and implications
- Know how to write down the likelihood function
- Be able to compute MLE/MAP estimates by hand
- Understand Laplace smoothing and the problem it solves
Compare Logistic Regression and Naive Bayes

Helpful Resources:

Murphy, §3.1-3.4: Generative Models for Discrete Data
Murphy, §3.5: Naive Bayes Classifiers
Murphy, §8.6: Generative vs. Discriminative Classifiers
CS229, Lecture Notes #2: Part IV, Generative Learning Algorithms

Advanced Reading, only if you're interested:

Paper: Ng & Jordan 2001, On Discriminative vs. Generative Classifiers: A Comparison of Logistic Regression and Naive Bayes
Paper: Zhang, H., 2004. "The optimality of naive Bayes". AA, 1(2), p.3.
Paper: Domingos, P. and Pazzani, M., 1997. "On the optimality of the simple Bayesian classifier under zero-one loss". Machine learning, 29(2-3), pp.103-130.

Lecture 8. Support Vector Machines, Part I, 2016-10-03 00:00:00-04:00

Learning Objectives:

Find a linear classifier
Maximize the margin
Get an optimization problem with constraints
Variables have some semantics
No corresponding probabilistic model

Helpful Resources:

Murphy, §14.5: Support Vector Machines
CS229, Lecture Notes #3: Part V, Support Vector Machines

Lecture 9. Support Vector Machines, Part II, 2016-10-05 00:00:00-04:00

Learning Objectives:

Optimal soft-margin hyperplane objective

Lecture 10. Bias-Variance Tradeoff, 2016-10-10 00:00:00-04:00

Learning Objectives:

Bias vs Variance
Visualization of Bias vs Variance
Model complexity

Helpful Resources:

Bishop, §3.2: The Bias-Variance Decomposition

Lecture 11, Decision Trees & Information Theory, 2016-10-12 00:00:00-04:00

Learning Objectives:

Entropy and mutual information

Helpful Resources:

Murphy, §2.8: Information Theory
Murphy, §16.1-16.2: Classification and Regression Trees
Bishop, §1.6: Information Theory
Blog Post: Aldo Cortesi, "Visualizing Entropy in Binary Files"