Week 3

Classification with Logistic Regression

We want to use learning algorithms to classify data into two or more categories represented by the y variable.

The simplest case is to represent outputs as 0 and 1, representing a negative or positive result.

  • if fw,b(x)<0.5→y^=0f_{w, b}(x) < 0.5 \to \hat{y} = 0

  • if fw,b(x)≥0.5→y^=1f_{w, b}(x) \ge 0.5 \to \hat{y} = 1

We don't use Linear Regression as it is affected by outliers very much, and doesn't give us an accurate best fit line for predicting.

The dividing line is called the Decision Boundary.

Logistic Regression

Let's use the cancer tumor prediction example. 1 represents malignant, and 0 reprsents a non-malignant tumor. Linear Regression is not a good algorithm for this problem.

However, Logistic Regression can fit an S-shaped curve to the data, which is more accurate. We want outputs between 0 and 1.

We use the sigmoid or logistic function which always gives an output between 0 and 1.

  • g(z)=11+e−zg(z) = \frac{1}{1 + e^{-z}}

  • 0<g(z)<10 < g(z) < 1

Here is the math for logistic regression model:

  • A straight line function was defined as z=w⃗⋅x⃗+bz = \vec{w} \sdot \vec{x} + b

  • g(z)=11+e−zg(z) = \frac{1}{1 + e{-z}}

  • Therefore, f(x⃗)=11+e−(w⃗⋅x⃗+b)f(\vec{x}) = \frac{1}{1 + e^{-(\vec{w}\sdot\vec{x} + b)}}

  • This represents the probability that the label is 1.

The threshold above or below which you predict whether y is 1 or 0 is called the Decision Boundary.

  • g(z)≥0.5g(z) \ge 0.5 whenever w⃗⋅x⃗+b≥0\vec{w} \sdot \vec{x} + b \ge 0

  • The decision boundary corresponds to when z = 0. This can be a line equation or a polynomial depending on the z expression

Cost Function for Logistic Regression

We need a different cost function to choose better parameters for Logistic Regression.

Squared Error Cost:

  • In Linear Regression

  • Now, we will have to use: J(w⃗,b)=1m∑i=1m12(fw⃗.b(x⃗(i))−y(i))2J(\vec{w}, b) = \frac{1}{m} \sum_{i=1}^m \frac{1}{2} (f_{\vec{w}. b}(\vec{x}^{(i)}) - y^{(i)})^2

  • The Linear Regression old cost function is non-convex

  • For Gradient Descent, there are lots of local minima that you can get stuck in. So there's a different cost function that we described above.

  • L(Fw⃗,b(x⃗(i),y(i))L(F_{\vec{w}, b} (\vec{x}^{(i)}, y^{(i)}) is the loss

Logistic Loss Function tells us how well we're doing on that example.

  • y(i)=1→−log(fw⃗,b(x⃗(i))y^{(i)} = 1 \to -log(f_{\vec{w}, b}(\vec{x}^{(i)})

  • y(i)=0→−log(1−fw⃗,b(x⃗(i))y^{(i)} = 0 \to -log( 1-f_{\vec{w}, b}(\vec{x}^{(i)})

Last updated