# Lab 5 - Perceptron (single layer)

The purpose of this lab is for you to familiarise yourself with the Python toolbox for implementing a Perceptron (single layer).

You will use the `Perceptron` function from the package `sklearn.linear_model`. Here is a link to the documentation, which you will need to refer to frequently as you work through this lab:

https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.Perceptron.html

As usual, we will also import `numpy` and `matplotlib.pyplot`.




In [None]:
import numpy as np
import matplotlib.pyplot as plt

from sklearn.linear_model import Perceptron

# 1. Linearly separable data

Generate 100 instances of the normal random variables `X` and `Y`, where the mean of $(X,Y)=(2,0)$, $X$ and $Y$ both have variance 1, and $X$ and $Y$ are uncorrelated (covariance is 0). (You may refer back to Lab 2.)

Generate 100 instances of the normal random variables `U` and `V`, where the mean of $(U,V)=(10,0)$, $U$ and $V$ both have variance 1, and $U$ and $V$ are uncorrelated.

You should have two 100 x 2 arrays. Concatenate these two arrays into one 200 x 2 array and create a corresponding array of class labels: 100 zeros, followed by 100 ones.

Make a scatter plot of your data, showing the $(X,Y)$ data in red and the $(U,V)$ data in blue.

We are considering the $(X,Y)$ data to be from one class and the $(U,V)$ data to be from another class. These classes should be linearly separable, that is you should be able to draw a line that has all of class 1 on the left and all of class 2 on the right (in the unlikely scenario that this is not the case, then generate a new random sample!).

Now have a go at training a perceptron to be able to classify the datapoints. (For simplicity here, do not split the data into a training and testing dataset - just train and test on the whole dataset.) Begin by just running the `Perceptron` model with default settings.

Batch learning will have been carried out. Have a look through what some of these default settings are that have been applied. (Note we have not discussed regularisation in the context of the perceptron.) Print out how many iterations of batch learning were performed to train the perceptron.

Now run the `predict` function to obtain predictions of the class of each data point - see Methods at https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.Perceptron.html

Import from `sklearn.metrics` functions to compute the *accuracy* and the *confusion matrix*, and display these for your data. https://scikit-learn.org/stable/modules/classes.html#module-sklearn.metrics

In [None]:
from sklearn.metrics import accuracy_score, confusion_matrix



You should have obtained 100% accuracy!

Plot the decision boundary and a scatter plot of the data again, on the same graph. The decision boundary is the line:

\begin{equation}
y=-\frac{w_1}{w_2}x - \frac{w_0}{w_2}
\end{equation}

where $w_1$ and $w_2$ are the 2 weights (stored in an array called *`coef_`*), and $w_0$ is the bias (stored in a variable called *`intercept_`*) of your trained perceptron.

[NB: although `sklearn` calls the bias here the intercept, this is not the same thing as the $y$-intercept of the decision boundary, which is $-w_0/w_2$.]

## Optional extension

Implement sequential learning (update the weights based on presentation of a single data point, rather than only after presentation of all 200 data points). For this you will need the `partial_fit` function from your `Perceptron`. Ideally, you should present the data in a novel random order for each epoch.

# 2. Non-linearly separable data

Repeat the above exercise with data from two non-linearly separable classes (simply bring the means closer or make the
standard deviations larger, so that the data from the two classes overlaps).

Play around with some of the hyperparameters, including the learning rate, and the criterion for stopping training (get training to stop when accuracy is no longer improving very much). Play around also with how much overlap there is in the data from the two classes (vary how close the means are to each other).

What do you observe? Do you always find the best decision boundary possible? When do you get the best, or most efficient result?

## Optional extensions

1. Play around with sequential learning on the non-linearly separable data.

2. Train a perceptron to distinguish the two non-linearly separable species of iris, *versicolor* and *virginica*, on Fisher's iris dataset (see Lab 1).