Lecture 1, Introduction to Machine Learning, 2016-09-07 00:00:00-04:00

Lecture 02, Linear Algebra Review, 2016-09-12 00:00:00-04:00

Learning Objectives:

  • Basic matrix operations. You should be able to interpret all of these geometrically AND write down generic formulas for each.
    • Vector addition, subtraction, and scaling.
    • Matrix-vector multiplication, and interpretations in terms of the rows and columns of the matrix.
    • Matrix-matrix multiplication, and interpretations.
    • Dot products, including how they relate to projections and the angle between vectors. More generally, inner products.
    • Projection onto a vector or onto a subspace.
    • Matrix transposes. Transpose of products, transpose of sums.
  • Basic Matrix Properties
    • Understand the rank of a matrix both algebraically and geometrically. What does it mean for a matrix to have full rank?
    • Be able to describe the image and kernel of a matrix using set notation.
    • Know all of the common equivalent conditions for a matrix to be invertible, in terms of the eigenvalues, the columns, the determinant, etc..
  • Vector spaces
    • Definition of an abstract vector space / subspace.
    • Definition of linearly independent vectors
    • Definition of basis, span, and dimension
    • Orthogonality and orthogonal / orthonormal bases
    • Know how to compute the coordinate vector for an element of an abstract vector space, with respect to a known basis.
    • Gram-Schmidt orthogonalization
    • Know what a linear transformation is.
  • Norms
    • Definition of a norm, and the properties that every norm must satisfy.
    • Understand and interpret geometrically the 1-norm, 2-norm, and infinity-norms. More generally, the p-norms.
  • Special matrices. Know the properties and common use cases for each. For example, what can we say about the eigenvalues/eigenvectors for each special matrix below? How do we multiply these matrices or take powers? What can we say about the rank?
    • Symmetric / Hermitian matrices.
    • Orthogonal / unitary matrices
    • Diagonal matrices
    • Triangular matrix
    • Positive-definite and positive-semidefinite matrices
  • Eigenvalues / eigenvectors
    • Geometric interpretation
    • Matrix diagonalization. Which matrices are diagonalizable and how can you tell?
    • Spectral theorem
  • Singular value decomposition
    • Geometric interpretation (the image of the unit circle under every matrix is a hyperellipse)
    • What does the SVD look like for the special matrices listed above?
    • How can we interpret the singular values?
    • How does the SVD relate to the rank of a matrix?

Helpful Resources:

Lecture 03, Probability Review & Intro to Optimization, 2016-09-14 00:00:00-04:00

Learning Objectives:

  • Essential concepts in probability
    • You should be able to work with both discrete distributions AND continuous distributions. Understand pdfs and cdfs.
    • What is the difference between a random variable and a distribution?
    • Independent random variables and events.
    • Common distributions and their properties (mean, variance, etc.)
      • Bernoulli
      • Binomial
      • Multinomial
      • Poisson
      • Categorical
      • Gaussian / Normal / Multivariate Normal
      • Beta
      • Dirichlet
    • Random vectors. (Vectors whose elements are random variables)
    • Expectation, variance, and covariance of random variables.
    • Conditional expectation and variance.
    • Functions of random variables (compute expectation of functions of random variables, etc.)
    • Joint distributions. Marginalization. Marginal distributions.
    • Bayes' theorem.
    • Law of Total Probability
    • You should be comfortable with conditional probability.
    • Infinite sequences of random variables.
    • Basic understanding of likelihood
  • Convex functions: definition and properties
  • Constrained vs. unconstrained optimization
  • Basic understanding of Lagrange duality

Helpful Resources:

Lecture 4. Linear Regression, Part I, 2016-09-19 00:00:00-04:00

Learning Objectives:

  • Supervised learning regression problem
  • Understand the assumptions made by the model of linear regression
  • Understand the difference between input $x$ and features $\phi(x)$
  • Intuitively understand the least squares objective function
  • Use gradient descent to minimize the objective function
    • Use matrix algebra to compute the gradient of the objective
    • Pros and cons of batch vs. stochastic gradient descent
  • Use the normal equations to find the least squares solution
    • Know how to derive the pseudoinverse
    • Understand the normal equations geometrically

Helpful Resources:

  • Murphy, §7.1-7.3: Linear Regression
  • Bishop, §1.1: Polynomial Curve Fitting Example
  • Bishop, §3.1: Linear Basis Function Models
  • CS229, Lecture Notes #1: Part I, Linear Regression

Advanced Reading:

Lecture 5. Linear Regression, Part II, 2016-09-21 00:00:00-04:00

Learning Objectives:

  • Overfitting and the need for regularization
    • Write the objective function for lasso and ridge regression
    • Use matrix calculus to find the gradient of the regularized objective
  • Understand the probabilistic interpretation of linear regression
    • MLE formulation of linear regression objective
    • MAP formulation of regularized linear regression
      • Different priors correspond to different regularization
  • Understand probabilistic modeling at a high level
    • Goals and assumptions of maximum likelihood estimation
    • Goals and assumptions of MAP estimation
    • Priors, likelihoods, and posteriors

Helpful Resources:

  • Murphy, §7.5: Ridge Regression

Advanced Reading, only if you're interested: Murphy, §13.3: $\ell_1$ Regularization Basics Murphy, §13.4: $\ell_1$ Regularization Algorithms

Lecture 6. Logistic Regression, 2016-09-26 00:00:00-04:00

Learning Objectives:

  • Undersand the supervised learning classification problem formulation
  • Understand the probabilistic interpretation of logistic regression
    • Know why we use the sigmoid function rather than a hard threshold
    • Write down the likelihood function
    • Be able to take the gradient of the negative log-likelihood objective
  • Use Newton's method to find the maximum likelihood parameter estimate
    • Understand why Newton's method applies here
    • Understand the difference between gradient descent and Newton's method
    • Understand Newton's method geometrically for one-dimensional problems
  • Know that there is no closed-form solution for logistic regression
  • Understand logistic regression as a linear classifier
  • Know how logistic regression can be generalized to softmax regression for multiclass problems

Helpful Resources:

Lecture 7. Naive Bayes, 2016-09-28 00:00:00-04:00

Learning Objectives:

  • Understand the difference between generative and discriminative classifiers
    • Know which models we've used belong to which category
  • Naive Bayes Classifiers
    • Understand the conditional independence assumption and implications
    • Know how to write down the likelihood function
    • Be able to compute MLE/MAP estimates by hand
    • Understand Laplace smoothing and the problem it solves
  • Compare Logistic Regression and Naive Bayes

Helpful Resources:

  • Murphy, §3.1-3.4: Generative Models for Discrete Data
  • Murphy, §3.5: Naive Bayes Classifiers
  • Murphy, §8.6: Generative vs. Discriminative Classifiers
  • CS229, Lecture Notes #2: Part IV, Generative Learning Algorithms

Advanced Reading, only if you're interested:

Lecture 8. Support Vector Machines, Part I, 2016-10-03 00:00:00-04:00

Learning Objectives:

  • Find a linear classifier
  • Maximize the margin
  • Get an optimization problem with constraints
  • Variables have some semantics
  • No corresponding probabilistic model

Helpful Resources:

Lecture 9. Support Vector Machines, Part II, 2016-10-05 00:00:00-04:00

Learning Objectives:

  • Optimal soft-margin hyperplane objective

Lecture 10. Bias-Variance Tradeoff, 2016-10-10 00:00:00-04:00

Learning Objectives:

  • Bias vs Variance
  • Visualization of Bias vs Variance
  • Model complexity

Helpful Resources:

  • Bishop, §3.2: The Bias-Variance Decomposition

Lecture 11, Decision Trees & Information Theory, 2016-10-12 00:00:00-04:00

Learning Objectives:

  • Entropy and mutual information

Helpful Resources:

  • Murphy, §2.8: Information Theory
  • Murphy, §16.1-16.2: Classification and Regression Trees
  • Bishop, §1.6: Information Theory
  • Blog Post: Aldo Cortesi, "Visualizing Entropy in Binary Files"