Machine Learning

Search Engine: listing results in order of relevance

Facial recognition on Facebook and Instagram

Siri and Alexa

Virtual Reality (VR) glasses which provide a gesture control gaming

Shopping suggestions on Amazon as well as customer segmentation

Uber suggests drop, location

Filtering spam emails

LINEAR REGRESSION

Linear regression helps us to find a relationship between 2 variables e.g. the relationship between height and weight

Consider the following dataset of the fuel consumption of a car in mpg (miles per gallon) versus how heavy the car is (kg)

*

If we plotted these points on a scatter plot, we would have theh figure shown below

*

Now let’s attempt to draw a line of best fit

This line would be defined by the equation y=mx+c, where
y is the dependent variable which represents the miles per gallon and the value we want to predict

m is the slope of the line

x is the independent variable which represents the weight in pounds and the value we input

c is the y-intercept

Python Code

*

In machine learning language, the equation for a linear regression model can be given as follows

y^′=b+w_1 x_1

y′ is the predicted label and the output

w_1 is the weight of the feature. This is a parameter calculated during training

b is the bias of the model. It is sometimes referred to as w_o. Bias is also a parameter of the model and calculated during training.

x_1 is a feature of the model and the input

The linear regression equation can be written in the form of

Y=a+bx

where a=[(∑▒y)(∑▒x^2 )-(∑▒x)(∑▒xy)]/[n(∑▒x^2 )-(∑▒x)^2 ]

and b=[n(∑▒xy)-(∑▒x)(∑▒y)]/[n(∑▒x^2 )-(∑▒x)^2 ]

The linear regression coefficient, B_1 is given by

=(∑▒〖[(x_i-x ̅ )(y_i-〗 y ̅)])/(∑▒[(x_i-x ̅ )^2 ] )

In this example, the model is defined as y^′=30+(-3.6)x_1

We can use this to predict that a 4000-pound car will have a fuel efficiency of 15.6 miles per gallon.

Loss

Loss in a linear regression model measures the distance between the predicted value and the actual value. The goal of training a model is to minimize the loss (reduce to its lowest possible value).
It is noteworthy that loss focuses on the distance between the values and not the direction. For example, if a model predicts 3 and the actual value is 5, we are not bothered that the loss is negative (3-5=-2),  but that the distance between the value is 2 

Some type of loss in regression problems include

Mean absolute error (MAE): 1/n ∑▒〖|actual value -predicted value|〗

Mean squared error (MSE): 1/n ∑▒(actual value -predicted value)^2

Notice that the main difference between MAE and MSE is squaring. When the difference between the prediction value and the original value is large, squaring it make the loss even larger. But when this difference is small (less than 1), squaring it makes it smaller

A model trained with MAE is away from the outliers but closer to most of the data point while a model trained with MSE is closer to the outliers but further away from the other data points

Gradient descent is a technique to reduce the loss by adjusting the weights and bias that produce the model.

First, the model begins training with near zero values for the weight and bias and calculates the loss. Then it repeats the following steps to reduce the loss

1.Determine the direction to move to reduce the loss

2.Increase or reduce the weight and/or bias such that it reduces the loss

Loss curve need some information.

A loss curve is usually generated when training a model. This curve shows how the loss changes as the model trains

Model with multiple features

*

Polynomial Regression

This classifier technique is based on conditional probability – the probability that an event will occur when another event has taken place. It is used to filter new emails for spam mail

P(C│A)=(P(A ┤|  C)(P(C))/P(A)

We commonly used naïve Bayes classification because they train well for small datasets

Python Code

SUPPORT VECTOR MACHINE (SVM)

In simplest terms, an SVM is a supervised algorithm that separates data using hyperplanes. In a 2-Dimensional set of points, the hyperplane would just be a straight line to make the distinction between the two sets. In an n-dimensional space, a hyperplane will separate the classes or groups of data. The SVM algorithm does not just fit any hyperplane but tries to find an optimal hyperplane that maximises the distance between the support vectors – points at the forefront of a class and closest to the opposite class.

DECISION TREES

This is a graph that uses branching method to illustrate the possible outcomes of a decision. It consists of nodes that represents tests/decisions based on features, branches representing the outcome of the decisions and leaf nodes which represent the final prediction – a categorical or numerical value.

This is a classification algorithm that is used to predict discrete or categorical values. It is particularly useful for binary classification where the outcome is a YES or NO, TRUE or FALSE, 0 or 1.

For example, a bank can design a model to predict whether a customer will default on the credit card payment or not based on spending history

The logistic regression model uses the sigmoid function given as

P=1/(1+e^(-z) )

The predicted value, P, will lie in the range from 0 to 1. z is the input

P tends towards 1 as z→∞, 0 as z→

If you draw a linear line on the scatter plot, Y=m_1 x+c_0

The predicted value can exceed the range 0 and 1.

The cutoff probability point is 0.5 meaning that any point z (obtained from z=mx+c) that has a probability value less than 0.5 is classified as 0, while if the probability is greater than 0.5, then it is classified as 1

Python Code

KNN – K NEAREST NEIGHBOUR

This is one of the basic classification algorithms in machine learning used to predict categorical values. It is applied in pattern recognition and data mining.

Similar data points will form clusters when plotted on a scatter plot. A new data point is assigned to a neighbouring group it is most similar to. K is the number of near neighbouring points we wish to compare the new data point to. It is an integer greater than 1, thus every new data point to be classified will require computation as to which neighbouring group it is closest to

To determine the closest groups, we use different types of metrics

1.Euclidean distance: This is the cartesian distance between the two points, that is, the length of the straight line that joins the two points in consideration

distance=√((x_i-X_j )^2 )

2. Manhattan distance: This metric finds the total distance travelled by the unknown point

d=∑_(i=1)^n▒|x_i-y_i |

It is advisable to chose an odd value of k. If the input data has more outliers or noise, a higher value of k should be chosen than the dataset with smaller outliers

SELECTING THE ML SOLUTION

1. Problem Statement

Is it future stock prices, then use supervised learning

2. Size, nature and quality of data

Is it clustered, then use unsupervised learning

3. Complexity of the algorithm

1.Prepare the model using ML pipeline (like PyCaret) for deployment in your normal Jupyter notebook

2.Deploy model to selected platform: You can use runway which is an enterprise ML platform. Build a webapp using Flask, a software or some other API to consume the trained model by providing new data points to get predictions. Deploy the app into a container so as to become accessible via a web URL. Create a Docker Image and Container. Publish the container on Azure Container Registry or AWS

3.Monitor and Update model

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Ut elit tellus, luctus nec ullamcorper mattis, pulvinar dapibus leo.

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Ut elit tellus, luctus nec ullamcorper mattis, pulvinar dapibus leo.

Scroll to Top