Click to enroll for April Stream
Live classes, hands-on experience, projects, and certificate
Click to enroll for May Stream
Live Classes, Hands-on experience, projects, and certificate
Machine Learning
- Introduction
- Linear Regression
- Naive Bayes
- Support Vector Machine
- Decision Trees
- Logistic Regression
- K Nearest Neighbour
- Model Deployment
- Modules
- What Next?
Search Engine: listing results in order of relevance
Facial recognition on Facebook and Instagram
Siri and Alexa
Virtual Reality (VR) glasses which provide a gesture control gaming
Shopping suggestions on Amazon as well as customer segmentation
Uber suggests drop, location
Filtering spam emails
LINEAR REGRESSION
Linear regression helps us to find a relationship between 2 variables e.g. the relationship between height and weight
Consider the following dataset of the fuel consumption of a car in mpg (miles per gallon) versus how heavy the car is (kg)
*
If we plotted these points on a scatter plot, we would have theh figure shown below
*
Now let’s attempt to draw a line of best fit
This line would be defined by the equation y=mx+c, where
y is the dependent variable which represents the miles per gallon and the value we want to predict
m is the slope of the line
x is the independent variable which represents the weight in pounds and the value we input
c is the y-intercept
Python Code
*
In machine learning language, the equation for a linear regression model can be given as follows
y^′=b+w_1 x_1
y′ is the predicted label and the output
w_1 is the weight of the feature. This is a parameter calculated during training
b is the bias of the model. It is sometimes referred to as w_o. Bias is also a parameter of the model and calculated during training.
x_1 is a feature of the model and the input
The linear regression equation can be written in the form of
Y=a+bx
where a=[(∑▒y)(∑▒x^2 )-(∑▒x)(∑▒xy)]/[n(∑▒x^2 )-(∑▒x)^2 ]
and b=[n(∑▒xy)-(∑▒x)(∑▒y)]/[n(∑▒x^2 )-(∑▒x)^2 ]
The linear regression coefficient, B_1 is given by
=(∑▒〖[(x_i-x ̅ )(y_i-〗 y ̅)])/(∑▒[(x_i-x ̅ )^2 ] )
In this example, the model is defined as y^′=30+(-3.6)x_1
We can use this to predict that a 4000-pound car will have a fuel efficiency of 15.6 miles per gallon.

Loss
Loss in a linear regression model measures the distance between the predicted value and the actual value. The goal of training a model is to minimize the loss (reduce to its lowest possible value).
It is noteworthy that loss focuses on the distance between the values and not the direction. For example, if a model predicts 3 and the actual value is 5, we are not bothered that the loss is negative (3-5=-2), but that the distance between the value is 2
Some type of loss in regression problems include
Mean absolute error (MAE): 1/n ∑▒〖|actual value -predicted value|〗
Mean squared error (MSE): 1/n ∑▒(actual value -predicted value)^2
Notice that the main difference between MAE and MSE is squaring. When the difference between the prediction value and the original value is large, squaring it make the loss even larger. But when this difference is small (less than 1), squaring it makes it smaller
A model trained with MAE is away from the outliers but closer to most of the data point while a model trained with MSE is closer to the outliers but further away from the other data points
Gradient descent is a technique to reduce the loss by adjusting the weights and bias that produce the model.
First, the model begins training with near zero values for the weight and bias and calculates the loss. Then it repeats the following steps to reduce the loss
1.Determine the direction to move to reduce the loss
2.Increase or reduce the weight and/or bias such that it reduces the loss
Loss curve need some information.
A loss curve is usually generated when training a model. This curve shows how the loss changes as the model trains
Model with multiple features
*
Polynomial Regression
This classifier technique is based on conditional probability – the probability that an event will occur when another event has taken place. It is used to filter new emails for spam mail
P(C│A)=(P(A ┤| C)(P(C))/P(A)
We commonly used naïve Bayes classification because they train well for small datasets
Python Code
SUPPORT VECTOR MACHINE (SVM)
In simplest terms, an SVM is a supervised algorithm that separates data using hyperplanes. In a 2-Dimensional set of points, the hyperplane would just be a straight line to make the distinction between the two sets. In an n-dimensional space, a hyperplane will separate the classes or groups of data. The SVM algorithm does not just fit any hyperplane but tries to find an optimal hyperplane that maximises the distance between the support vectors – points at the forefront of a class and closest to the opposite class.

DECISION TREES
This is a graph that uses branching method to illustrate the possible outcomes of a decision. It consists of nodes that represents tests/decisions based on features, branches representing the outcome of the decisions and leaf nodes which represent the final prediction – a categorical or numerical value.

This is a classification algorithm that is used to predict discrete or categorical values. It is particularly useful for binary classification where the outcome is a YES or NO, TRUE or FALSE, 0 or 1.
For example, a bank can design a model to predict whether a customer will default on the credit card payment or not based on spending history
The logistic regression model uses the sigmoid function given as
P=1/(1+e^(-z) )
The predicted value, P, will lie in the range from 0 to 1. z is the input
P tends towards 1 as z→∞, 0 as z→
If you draw a linear line on the scatter plot, Y=m_1 x+c_0
The predicted value can exceed the range 0 and 1.
The cutoff probability point is 0.5 meaning that any point z (obtained from z=mx+c) that has a probability value less than 0.5 is classified as 0, while if the probability is greater than 0.5, then it is classified as 1

Python Code
KNN – K NEAREST NEIGHBOUR
This is one of the basic classification algorithms in machine learning used to predict categorical values. It is applied in pattern recognition and data mining.
Similar data points will form clusters when plotted on a scatter plot. A new data point is assigned to a neighbouring group it is most similar to. K is the number of near neighbouring points we wish to compare the new data point to. It is an integer greater than 1, thus every new data point to be classified will require computation as to which neighbouring group it is closest to

To determine the closest groups, we use different types of metrics
1.Euclidean distance: This is the cartesian distance between the two points, that is, the length of the straight line that joins the two points in consideration
distance=√((x_i-X_j )^2 )
2. Manhattan distance: This metric finds the total distance travelled by the unknown point
d=∑_(i=1)^n▒|x_i-y_i |
It is advisable to chose an odd value of k. If the input data has more outliers or noise, a higher value of k should be chosen than the dataset with smaller outliers
SELECTING THE ML SOLUTION
1. Problem Statement
Is it future stock prices, then use supervised learning
2. Size, nature and quality of data
Is it clustered, then use unsupervised learning
3. Complexity of the algorithm
1.Prepare the model using ML pipeline (like PyCaret) for deployment in your normal Jupyter notebook
2.Deploy model to selected platform: You can use runway which is an enterprise ML platform. Build a webapp using Flask, a software or some other API to consume the trained model by providing new data points to get predictions. Deploy the app into a container so as to become accessible via a web URL. Create a Docker Image and Container. Publish the container on Azure Container Registry or AWS
3.Monitor and Update model
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Ut elit tellus, luctus nec ullamcorper mattis, pulvinar dapibus leo.
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Ut elit tellus, luctus nec ullamcorper mattis, pulvinar dapibus leo.