# Lecture 02, Linear Algebra Review, 2016-09-12 00:00:00-04:00

Learning Objectives:

• Basic matrix operations. You should be able to interpret all of these geometrically AND write down generic formulas for each.
• Vector addition, subtraction, and scaling.
• Matrix-vector multiplication, and interpretations in terms of the rows and columns of the matrix.
• Matrix-matrix multiplication, and interpretations.
• Dot products, including how they relate to projections and the angle between vectors. More generally, inner products.
• Projection onto a vector or onto a subspace.
• Matrix transposes. Transpose of products, transpose of sums.
• Basic Matrix Properties
• Understand the rank of a matrix both algebraically and geometrically. What does it mean for a matrix to have full rank?
• Be able to describe the image and kernel of a matrix using set notation.
• Know all of the common equivalent conditions for a matrix to be invertible, in terms of the eigenvalues, the columns, the determinant, etc..
• Vector spaces
• Definition of an abstract vector space / subspace.
• Definition of linearly independent vectors
• Definition of basis, span, and dimension
• Orthogonality and orthogonal / orthonormal bases
• Know how to compute the coordinate vector for an element of an abstract vector space, with respect to a known basis.
• Gram-Schmidt orthogonalization
• Know what a linear transformation is.
• Norms
• Definition of a norm, and the properties that every norm must satisfy.
• Understand and interpret geometrically the 1-norm, 2-norm, and infinity-norms. More generally, the p-norms.
• Special matrices. Know the properties and common use cases for each. For example, what can we say about the eigenvalues/eigenvectors for each special matrix below? How do we multiply these matrices or take powers? What can we say about the rank?
• Symmetric / Hermitian matrices.
• Orthogonal / unitary matrices
• Diagonal matrices
• Triangular matrix
• Positive-definite and positive-semidefinite matrices
• Eigenvalues / eigenvectors
• Geometric interpretation
• Matrix diagonalization. Which matrices are diagonalizable and how can you tell?
• Spectral theorem
• Singular value decomposition
• Geometric interpretation (the image of the unit circle under every matrix is a hyperellipse)
• What does the SVD look like for the special matrices listed above?
• How can we interpret the singular values?
• How does the SVD relate to the rank of a matrix?

# Lecture 03, Probability Review & Intro to Optimization, 2016-09-14 00:00:00-04:00

Learning Objectives:

• Essential concepts in probability
• You should be able to work with both discrete distributions AND continuous distributions. Understand pdfs and cdfs.
• What is the difference between a random variable and a distribution?
• Independent random variables and events.
• Common distributions and their properties (mean, variance, etc.)
• Bernoulli
• Binomial
• Multinomial
• Poisson
• Categorical
• Gaussian / Normal / Multivariate Normal
• Beta
• Dirichlet
• Random vectors. (Vectors whose elements are random variables)
• Expectation, variance, and covariance of random variables.
• Conditional expectation and variance.
• Functions of random variables (compute expectation of functions of random variables, etc.)
• Joint distributions. Marginalization. Marginal distributions.
• Bayes' theorem.
• Law of Total Probability
• You should be comfortable with conditional probability.
• Infinite sequences of random variables.
• Basic understanding of likelihood
• Convex functions: definition and properties
• Constrained vs. unconstrained optimization
• Basic understanding of Lagrange duality

# Lecture 4. Linear Regression, Part I, 2016-09-19 00:00:00-04:00

Learning Objectives:

• Supervised learning regression problem
• Understand the assumptions made by the model of linear regression
• Understand the difference between input $x$ and features $\phi(x)$
• Intuitively understand the least squares objective function
• Use gradient descent to minimize the objective function
• Use matrix algebra to compute the gradient of the objective
• Pros and cons of batch vs. stochastic gradient descent
• Use the normal equations to find the least squares solution
• Know how to derive the pseudoinverse
• Understand the normal equations geometrically

• Murphy, §7.1-7.3: Linear Regression
• Bishop, §1.1: Polynomial Curve Fitting Example
• Bishop, §3.1: Linear Basis Function Models
• CS229, Lecture Notes #1: Part I, Linear Regression

# Lecture 5. Linear Regression, Part II, 2016-09-21 00:00:00-04:00

Learning Objectives:

• Overfitting and the need for regularization
• Write the objective function for lasso and ridge regression
• Use matrix calculus to find the gradient of the regularized objective
• Understand the probabilistic interpretation of linear regression
• MLE formulation of linear regression objective
• MAP formulation of regularized linear regression
• Different priors correspond to different regularization
• Understand probabilistic modeling at a high level
• Goals and assumptions of maximum likelihood estimation
• Goals and assumptions of MAP estimation
• Priors, likelihoods, and posteriors

• Murphy, §7.5: Ridge Regression

Advanced Reading, only if you're interested: Murphy, §13.3: $\ell_1$ Regularization Basics Murphy, §13.4: $\ell_1$ Regularization Algorithms

# Lecture 6. Logistic Regression, 2016-09-26 00:00:00-04:00

Learning Objectives:

• Undersand the supervised learning classification problem formulation
• Understand the probabilistic interpretation of logistic regression
• Know why we use the sigmoid function rather than a hard threshold
• Write down the likelihood function
• Be able to take the gradient of the negative log-likelihood objective
• Use Newton's method to find the maximum likelihood parameter estimate
• Understand why Newton's method applies here
• Understand the difference between gradient descent and Newton's method
• Understand Newton's method geometrically for one-dimensional problems
• Know that there is no closed-form solution for logistic regression
• Understand logistic regression as a linear classifier
• Know how logistic regression can be generalized to softmax regression for multiclass problems

# Lecture 7. Naive Bayes, 2016-09-28 00:00:00-04:00

Learning Objectives:

• Understand the difference between generative and discriminative classifiers
• Know which models we've used belong to which category
• Naive Bayes Classifiers
• Understand the conditional independence assumption and implications
• Know how to write down the likelihood function
• Be able to compute MLE/MAP estimates by hand
• Understand Laplace smoothing and the problem it solves
• Compare Logistic Regression and Naive Bayes

• Murphy, §3.1-3.4: Generative Models for Discrete Data
• Murphy, §3.5: Naive Bayes Classifiers
• Murphy, §8.6: Generative vs. Discriminative Classifiers
• CS229, Lecture Notes #2: Part IV, Generative Learning Algorithms

# Lecture 8. Support Vector Machines, Part I, 2016-10-03 00:00:00-04:00

Learning Objectives:

• Find a linear classifier
• Maximize the margin
• Get an optimization problem with constraints
• Variables have some semantics
• No corresponding probabilistic model

# Lecture 9. Support Vector Machines, Part II, 2016-10-05 00:00:00-04:00

Learning Objectives:

• Optimal soft-margin hyperplane objective

# Lecture 10. Bias-Variance Tradeoff, 2016-10-10 00:00:00-04:00

Learning Objectives:

• Bias vs Variance
• Visualization of Bias vs Variance
• Model complexity

• Bishop, §3.2: The Bias-Variance Decomposition

# Lecture 11, Decision Trees & Information Theory, 2016-10-12 00:00:00-04:00

Learning Objectives:

• Entropy and mutual information