Learning Machine Learning and Artificial Intelligence with Blast

Learning Machine Learning and Artificial Intelligence with Blast

Article Two - Building a Linear Classifier in pure Tensorflow

Hello Everyone! Welcome once again to the series where we’re on a journey to becoming Machine Learning Engineers and Artificial Intelligence researchers.

In the previous article, I shared the history of the field of Machine Learning and also introduced the concept of Deep Learning. In this article, we’ll be writing code to build a Linear Classifier in pure Tensorflow and I’ll also be introducing you to a platform that will help you get started with practical Deep Learning.

As always, by the end of this article you’ll be a step closer to becoming an Artificial Intelligence Researcher or a Machine Learning Engineer.

Okay let’s dive in!

Keras and TensorFlow are the Python-based Deep Learning tools we’ll be using for now for these articles. You’ll find out how to set up a Deep Learning workspace with TensorFlow, Keras, and GPU support.

TensorFlow is a Python-based, free, open source Machine Learning platform, developed primarily by Google. Keras is a Deep Learning API for Python, built on top of TensorFlow, that provides a convenient way to define and train any kind of Deep Learning model.

We have three options to do Deep Learning:

  • Buy and install a physical NVIDIA GPU on your workstation.

  • Use GPU instances on Google Cloud or AWS EC2.

  • Use the free GPU runtime from Colaboratory, a hosted notebook service offered by Google

Colaboratory is the easiest way to get started, no need to purchase hardware or install software, just open a tab in your browser and start coding.

Colaboratory (or Colab for short) is a free Jupyter notebook service, it’s a web page that lets you write and execute Keras scripts right away. It gives you access to a free (but limited (12 hours)) GPU runtime.

To get started with Colab, go to https://colab.research.google.com and click the new notebook button. Once it’s opened you’ll notice two buttons in the toolbar: + Code and + Text. They’re for creating executable Python code cells and annotation text cells, respectively.

After entering code in a code cell, pressing control+return (for mac pc, for windows it’s control+enter) will execute it or you just click the play button at the left of the code cell. In a text cell, you can use markdown syntax, pressing control+return on a text cell will render it.

To use the GPU runtime with Colab, select runtime > Change runtime type in the menu and select GPU for the hardware accelerator. Your TensorFlow and Keras code will automatically execute on GPU once you’ve updated this.

You can take some time to get familiar with the Colab workspace, try out hello world in a code cell and run it. Once you’re ready then we’ll proceed.

In a Machine Learning job interview, you may be asked to implement a Linear Classifier from scratch in TensorFlow: a task that serves as a filter between candidates who have who have some minimal Machine Learning background and those who don’t.

To solve this, first let’s come up with some nicely linearly separable synthetic data to work with: two classes of points in a 2D plane. We’ll generate each class of points by drawing their coordinates from a random distribution with a specific covariance matrix and a specific mean.

Intuitively, the covariance matrix describes the shape of the point cloud i.e cluster of points, and the mean describes its position in the plane. We’ll reuse the same covariance matrix for both point clouds, but we’ll use two different mean values i.e the point clouds will have the same shape, but different positions.

import numpy as np

num_samples_per_class = 1000
negative_samples = np.random.multivariate_normal(
    mean=[0, 3], cov=[[1, 0.5], [0.5, 1]], size=num_samples_per_class)
positive_samples = np.random.multivariate_normal(
    mean=[3, 0], cov=[[1, 0.5], [0.5, 1]], size=num_samples_per_class)

# negative_samples generates the first class of points: 1000 random 2D points. 
# cov=[[1, 0.5],[0.5, 1]] corresponds to an oval-like point cloud oriented from bottom left to top right
# positive_samples generates the other class of points with a different mean and same covariance matrix.

In the above code, negative_samples and positive_samples are both arrays with shape (1000, 2) i.e 1000 rows (points) and 2 columns (x, y axis). Let’s stack them into a single array with shape (2000, 2).

inputs = np.vstack((negative_samples, postive_samples)).astype(np.float32)

Next, let’s generate the corresponding target labels, an array of zeros and ones of shape (2000, 1), where targets[i, 0] is 0 if inputs[i] belongs to class 0 (and inversely).

targets = np.vstack((np.zeros((num_samples_per_class, 1), dtype="float32"), 
    np.ones((num_samples_per_class, 1), dtype="float32")))

Next, let’s plot our data with Matplotlib

import matplotlib.pyplot as plt
plt.scatter(inputs[:, 0], inputs[:, 1], c=targets[:, 0])
plt.show()

Typing all the code from the first block up to this point and running it in Colab should show our synthetic data: two classes of random points in the 2D plane.

Where you able to get that?

Now let’s create a Linear Classifier that can learn to separate the two blobs. A Linear Classifier is an affine transformation (prediction = W . input + b) trained to minimize the square of the difference between predictions and the targets. i.e Machine Learning

Let’s create our variables, W and b, initialized with random values and with zeros, respectively.

W stands for weights and b stands for bias. The goal of training a Linear Classifier is to adjust these parameters (weights and bias) to minimize the difference between the predicted values and actual targets. This difference is often measured using a loss function, such as the mean squared error, which calculates the square of the difference between predictions and targets.

The weights (W) determine the importance of each input feature in making predictions, while the bias (b) allows the model to make adjustments independent of the input features. Together, they help the model learn to make accurate predictions.

input_dim = 2 # the inputs will be 2D points
output_dim = 1 # the output prediction will be a single score per sample  
# (close to 0 if the sample is predicted to be in class 0, and close to 1 if the sample is predicted to be in class 1)

W = tf.Variable(initial_value=tf.random.uniform(shape=(input_dim, output_dim))
b = tf.Variable(initial_value=tf.zeros(shape=(output_dim,)))

Next is our forward pass function. It is responsible for computing the output of the model given a set of inputs.

import tensorflow as tf

def model(inputs):
    return tf.matmul(inputs, W) + b

Then our loss function. It calculates the mean squared error between the target values and the predictions made by the model. The mean squared error is used as a measure of how well the model's predictions match the actual target values, with lower values indicating better performance.

def square_loss(targets, predictions):
    per_sample_losses = tf.square(targets - predictions)
    return tf.reduce_mean(per_sample_losses)

Next is the training step, which receives some training data and updates the weights (W) and bias (b) so as to minimize the loss on the data.

learning_rate = 0.1 

def training_step(inputs, targets):
    with tf.GradientTape() as tape:
        predictions = model(inputs) # forward pass
        loss = square_loss(predictions, targets) # loss function
    grad_loss_wrt_W, grad_loss_wrt_b = tape.gradient(loss, [W, b])
    W.assign_sub(grad_loss_wrt_W * learning_rate) # updates weights 
    b.assign_sub(grad_loss_wrt_b * learning_rate) # updates bias
    return loss

# batch training loop
for step in range(40):
    loss = training_step(inputs, targets)
    print(f"Loss at step {step}: {loss:.4f}")

After 40 steps, the training loss seems to have stabilized around 0.025. Let’s plot how our linear model classifies the training data points. Because our targets are zeros and ones, a given input point will be classified as “0” if its prediction value is below 0.5 and as “1” if it’s above 0.5

predictions = model(inputs)
plt.scatter(inputs[:, 0], inputs[:, 1], c=prdeictions[:, 0] > 0.5)
plt.show()

# Let's plot the line separating the points
x = np.linspace(-1, 4, 100) # generate 100 regularly spaced numbers between -1 and 4, which we will use to plot our line
y = - W[0] / W[1] * x + (0.5 - b) / W[1] # This is our line's equation
plt.plot(x, y, "-r") # plot our line ("-r" means "plot it as a red line")
plt.scatter(inputs[:, 0], inputs[:, 1], c=predictions[:, 0] > 0.5) # plot our model's prediction on the same plot

Running this on Colab should give you the red line separating the two classes of points. This is what a Linear Classifier is all about: finding the parameters of a line (or, in higher-dimensional spaces, a hyperplane) neatly separating two classes of data.

So there you have it. Ensure to run the code and explore every aspect for yourself. If there’s anything that’s not clear do well to make more research into it.

I’ll be ending this article here and in the next article, we’ll implement binary classification to build a model to classify movie reviews as either positive or negative.

If you want to have this article sent directly to your mail follow this link. Also, if you’d prefer a video where I write the code in Colab and we train the model together, you can leave a comment or reach out via email and I’ll add that in these articles.

Till next time, bye!