Week 1
Overview of Machine Learning
Machine learning is the ability for computers to learn without being explicitly programmed
Supervised Machine Learning
Unsupervised Machine Learning
Supervised Machine Learning
Algorithms that learns from or input to output mappings
They are given "right answers" to learn from - meaning they are given the correct y label for an input x.
Regression
Using x labels and output y labels to predict continuous values.
Fitting a line (or other complex curve) to the data
Task of learning algorithm is to produce more "right" answers
Example: Housing price prediction
Classification
Using given inputs, classify into output categories
Classification algorithms predict categories
Categories can be numeric or non-numeric. However, the prediction is a discrete and finite set of possible outputs.
May also have multiple inputs
Example: Cancer cell detection
Unsupervised Machine Learning
Algorithm learns by itself - it is not given y outputs
Find some patterns or something interesting in the data
Clustering
Algorithm that may group the data into clusters depending on some common pattern that it finds
Example: Google News, Market Segmentation
Anomaly Detection
Detect unusual events in data
Example: Detect fraud in the financial system
Dimensionality Reduction
Reduce a big data set to a smaller one by compressing it while losing as little information as possible
Useful Terminology
Training set: Dataset used to train the model
Input: The feature or input variable
Output: The target or output variable that you are trying to predict
Training example: representing the row of the training set
Regression Model
Linear Regression
Fitting a straight line to the data
To train the model, you feed training set, input features and target to the learning algorithm. It then produces a function , also called a hypothesis function.
We can use the model by giving the input x to f such that , where is the predicted y value.
. The values for w and b determine the prediction and different values will output different predictions.
This is linear regression with one variable, or one input feature x, also called Univariate Linear Regression.
Cost Function
To implement Linear Regression, we need to define a cost function
It tells us how well the model is doing, so we can refine it
Model is represented by . w, b are parameters which represent coefficients or weights.
w represents the slope of the line, while b represents the y-intercept.
How do we tell whether the line "fits the data" or not? Intuitively, it is some line which goes through or is near some of the training examples
Model notation:
Cost function measures how close is to by calculating Error
Final Cost Function: , where m is the number of training examples. We want to find values of w and b which minimize this function.
If you visualize the cost function, you get a 3d gradient plot. You can look at it from above via a contour plot by slicing it. For more intuition, look up visualizing cost function.
Gradient Descent
Gradient Descent applies to not just Linear Regression, but also more general functions .
Gradient Descent Intuition
Have some function
Want to minimize it
Outline of the process
Start with some w, b (set w = 0, b = 0)
Keep changing w, b to reduce
Until we settle at a or near minimum
For non-convex functions, you may end up at a local minimum
Implementation
One each step, w is updated to
refers to the learning rate. It is always a positive number, and a negative number results in w increasing.
b is updated to
To simultaneously update w and b,
Set a temporary variable
Another temporary variable
Update w and b to the new values after computing both and
Near a local minimum,
Derivative becomes smaller
Update steps become smaller
So we cannot reach a local minimum with a fixed learning rate
Gradient Descent for Linear Regression
Repeat until convergence
Batch Gradient Descent: One every step of gradient descent we look at all the training examples instead of a subset. We're computing the sum of all training examples! This is expensive to compute for large subsets. Other versions exist which use smaller subsets at each step.
Last updated