Lecture 1, Introduction to Machine Learning, 2016-09-07 00:00:00-04:00
Lecture 02, Linear Algebra Review, 2016-09-12 00:00:00-04:00
- Basic matrix operations. You should be able to interpret all of these geometrically AND write down generic formulas for each.
- Vector addition, subtraction, and scaling.
- Matrix-vector multiplication, and interpretations in terms of the rows and columns of the matrix.
- Matrix-matrix multiplication, and interpretations.
- Dot products, including how they relate to projections and the angle between vectors. More generally, inner products.
- Projection onto a vector or onto a subspace.
- Matrix transposes. Transpose of products, transpose of sums.
- Basic Matrix Properties
- Understand the rank of a matrix both algebraically and geometrically. What does it mean for a matrix to have full rank?
- Be able to describe the image and kernel of a matrix using set notation.
- Know all of the common equivalent conditions for a matrix to be invertible, in terms of the eigenvalues, the columns, the determinant, etc..
- Vector spaces
- Definition of an abstract vector space / subspace.
- Definition of linearly independent vectors
- Definition of basis, span, and dimension
- Orthogonality and orthogonal / orthonormal bases
- Know how to compute the coordinate vector for an element of an abstract vector space, with respect to a known basis.
- Gram-Schmidt orthogonalization
- Know what a linear transformation is.
- Definition of a norm, and the properties that every norm must satisfy.
- Understand and interpret geometrically the 1-norm, 2-norm, and infinity-norms. More generally, the p-norms.
- Special matrices. Know the properties and common use cases for each. For example, what can we say about the eigenvalues/eigenvectors for each special matrix below? How do we multiply these matrices or take powers? What can we say about the rank?
- Symmetric / Hermitian matrices.
- Orthogonal / unitary matrices
- Diagonal matrices
- Triangular matrix
- Positive-definite and positive-semidefinite matrices
- Eigenvalues / eigenvectors
- Geometric interpretation
- Matrix diagonalization. Which matrices are diagonalizable and how can you tell?
- Spectral theorem
- Singular value decomposition
- Geometric interpretation (the image of the unit circle under every matrix is a hyperellipse)
- What does the SVD look like for the special matrices listed above?
- How can we interpret the singular values?
- How does the SVD relate to the rank of a matrix?
Lecture 03, Probability Review & Intro to Optimization, 2016-09-14 00:00:00-04:00
- Essential concepts in probability
- You should be able to work with both discrete distributions AND continuous distributions. Understand pdfs and cdfs.
- What is the difference between a random variable and a distribution?
- Independent random variables and events.
- Common distributions and their properties (mean, variance, etc.)
- Gaussian / Normal / Multivariate Normal
- Random vectors. (Vectors whose elements are random variables)
- Expectation, variance, and covariance of random variables.
- Conditional expectation and variance.
- Functions of random variables (compute expectation of functions of random variables, etc.)
- Joint distributions. Marginalization. Marginal distributions.
- Bayes' theorem.
- Law of Total Probability
- You should be comfortable with conditional probability.
- Infinite sequences of random variables.
- Basic understanding of likelihood
- Convex functions: definition and properties
- Constrained vs. unconstrained optimization
- Basic understanding of Lagrange duality
Lecture 4. Linear Regression, Part I, 2016-09-19 00:00:00-04:00
- Supervised learning regression problem
- Understand the assumptions made by the model of linear regression
- Understand the difference between input $x$ and features $\phi(x)$
- Intuitively understand the least squares objective function
- Use gradient descent to minimize the objective function
- Use matrix algebra to compute the gradient of the objective
- Pros and cons of batch vs. stochastic gradient descent
- Use the normal equations to find the least squares solution
- Know how to derive the pseudoinverse
- Understand the normal equations geometrically
- Murphy, §7.1-7.3: Linear Regression
- Bishop, §1.1: Polynomial Curve Fitting Example
- Bishop, §3.1: Linear Basis Function Models
- CS229, Lecture Notes #1: Part I, Linear Regression
Lecture 5. Linear Regression, Part II, 2016-09-21 00:00:00-04:00
- Overfitting and the need for regularization
- Write the objective function for lasso and ridge regression
- Use matrix calculus to find the gradient of the regularized objective
- Understand the probabilistic interpretation of linear regression
- MLE formulation of linear regression objective
- MAP formulation of regularized linear regression
- Different priors correspond to different regularization
- Understand probabilistic modeling at a high level
- Goals and assumptions of maximum likelihood estimation
- Goals and assumptions of MAP estimation
- Priors, likelihoods, and posteriors
- Murphy, §7.5: Ridge Regression
Advanced Reading, only if you're interested:
Murphy, §13.3: $\ell_1$ Regularization Basics
Murphy, §13.4: $\ell_1$ Regularization Algorithms
Lecture 6. Logistic Regression, 2016-09-26 00:00:00-04:00
- Undersand the supervised learning classification problem formulation
- Understand the probabilistic interpretation of logistic regression
- Know why we use the sigmoid function rather than a hard threshold
- Write down the likelihood function
- Be able to take the gradient of the negative log-likelihood objective
- Use Newton's method to find the maximum likelihood parameter estimate
- Understand why Newton's method applies here
- Understand the difference between gradient descent and Newton's method
- Understand Newton's method geometrically for one-dimensional problems
- Know that there is no closed-form solution for logistic regression
- Understand logistic regression as a linear classifier
- Know how logistic regression can be generalized to softmax regression for multiclass problems
Lecture 7. Naive Bayes, 2016-09-28 00:00:00-04:00
- Understand the difference between generative and discriminative classifiers
- Know which models we've used belong to which category
- Naive Bayes Classifiers
- Understand the conditional independence assumption and implications
- Know how to write down the likelihood function
- Be able to compute MLE/MAP estimates by hand
- Understand Laplace smoothing and the problem it solves
- Compare Logistic Regression and Naive Bayes
- Murphy, §3.1-3.4: Generative Models for Discrete Data
- Murphy, §3.5: Naive Bayes Classifiers
- Murphy, §8.6: Generative vs. Discriminative Classifiers
- CS229, Lecture Notes #2: Part IV, Generative Learning Algorithms
Advanced Reading, only if you're interested:
- Paper: Ng & Jordan 2001, On Discriminative vs. Generative Classifiers: A Comparison of Logistic Regression and Naive Bayes
- Paper: Zhang, H., 2004. "The optimality of naive Bayes". AA, 1(2), p.3.
- Paper: Domingos, P. and Pazzani, M., 1997. "On the optimality of the simple Bayesian classifier under zero-one loss". Machine learning, 29(2-3), pp.103-130.
Lecture 8. Support Vector Machines, Part I, 2016-10-03 00:00:00-04:00
- Find a linear classifier
- Maximize the margin
- Get an optimization problem with constraints
- Variables have some semantics
- No corresponding probabilistic model
Lecture 9. Support Vector Machines, Part II, 2016-10-05 00:00:00-04:00
- Optimal soft-margin hyperplane objective
Lecture 10. Bias-Variance Tradeoff, 2016-10-10 00:00:00-04:00
- Bias vs Variance
- Visualization of Bias vs Variance
- Model complexity
- Bishop, §3.2: The Bias-Variance Decomposition
Lecture 11, Decision Trees & Information Theory, 2016-10-12 00:00:00-04:00
- Entropy and mutual information