Skip to content

Neural Network

Welcome to your next lab! You will solve problem of handwritten digits recognition using Neural Network.

You will learn to:

  • Build the general architecture of a learning algorithm with OOP in mind:
    • Helper utilities
      • Sigmoid (and it's derivative)
      • One-Hot
      • Cost Function
      • Regularization
    • Neural Network Class
      • Forward propagation
      • Backward propagation
      • Upgrade parameters
    • Main Model Classes
      • Training
      • Prediction

0 - Download data

!pip install wget
import wget
wget.download('https://dru.fra1.digitaloceanspaces.com/DS_Fundamentals/datasets/04_supervised_learning/Neural_Network/train-images-idx3-ubyte')
wget.download('https://dru.fra1.digitaloceanspaces.com/DS_Fundamentals/datasets/04_supervised_learning/Neural_Network/train-labels-idx1-ubyte')

1 - Packages

First, let's run the cell below to import all the packages that you will need during this assignment.

  • numpy is the fundamental package for scientific computing with Python.
  • matplotlib is a famous library to plot graphs in Python.
  • seaborn is a Python visualization library which provides a high-level interface for drawing attractive statistical graphics.
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

%matplotlib inline
sns.set(style='whitegrid', palette='muted', font_scale=1.5)

2 - Overview of the Problem set

Problem Statement: We'll use the MNIST data set, which contains tens of thousands of scanned images of handwritten digits, together with their correct classifications. MNIST's name comes from the fact that it is a modified subset of two data sets collected by NIST, the United States' National Institute of Standards and Technology.

The MNIST dataset contains 60,000 images. These images are scanned handwriting samples from 250 people, half of whom were US Census Bureau employees, and half of whom were high school students. The images are greyscale and 28 by 28 pixels in size. So, you are given a dataset containing:

  • a training set of m_train examples labeld as 0-9
  • a test set of m_test examples labeld as 0-9
  • each example is an array of length 784 (28 * 28) which represents image of handwritten digit.

You will build an algorithm that can recognize handwritten digits.

Let's get more familiar with the dataset. Load the data by running the following code.

# Read the MNIST dataset from ubyte files 

def read_mnist(images_path, labels_path):
    import struct
    import os
    with open(labels_path, 'rb') as p:
        # Read the magic number and number of labels
        magic, n = struct.unpack('>II', p.read(8))
        # Read the labels
        labels = np.fromfile(p, dtype=np.uint8)
    with open(images_path, 'rb') as p:
        # Read the magic number, number of images, rows, and columns
        magic, num, rows, cols = struct.unpack(">IIII", p.read(16))
        # Read the images and reshape them to (number of images, 784)
        images = np.fromfile(p, dtype=np.uint8).reshape(len(labels), 784)

    return images, labels

# Shuffle dataset

def shuffle_data(features, labels, random_seed=42):
    # Ensure the number of features matches the number of labels
    assert len(features) == len(labels)

    if random_seed:
        # Set the random seed for reproducibility
        np.random.seed(random_seed)
    # Generate a random permutation of indices
    idx = np.random.permutation(len(features))
    # Shuffle features and labels using the generated indices
    return [a[idx] for a in [features, labels]] 

# Loading data

def load_data():     
    # Read the MNIST data
    X, y = read_mnist('train-images-idx3-ubyte', 'train-labels-idx1-ubyte')
    # Shuffle the data
    X, y = shuffle_data(X, y, random_seed=42)
    # Split the data into training and test sets
    train_set_x, train_set_y = X[:5000], y[:5000]
    test_set_x, test_set_y = X[5000:], y[5000:]
    
    # Reshape the data for compatibility with the model
    test_set_x = test_set_x.reshape(test_set_x.shape[0], -1).T
    train_set_x = train_set_x.reshape(train_set_x.shape[0], -1).T
    train_set_y = train_set_y.reshape((1, train_set_y.shape[0]))
    test_set_y = test_set_y.reshape((1, test_set_y.shape[0]))
    
    return train_set_x, test_set_x, train_set_y, test_set_y

Let's create train and test datasets:

train_set_x, test_set_x, train_set_y, test_set_y = load_data()
print('train set shapes: ', train_set_x.shape, train_set_y.shape)
print('test set shapes: ', test_set_x.shape, test_set_y.shape)

Expected Output:

train set shapes: (784, 5000)
(1, 5000)
test set shapes: (784, 55000)
(1, 55000)

Data exploration

Let's build a function to check how the data looks like:

def plot_digit(x_set, y_set, idx):
    img = x_set.T[idx].reshape(28,28)
    plt.imshow(img, cmap='Greys',  interpolation='nearest')
    plt.title('true label: %d' % y_set.T[idx])
    plt.show()
plot_digit(train_set_x, train_set_y, idx=1)
plot_digit(train_set_x, train_set_y, idx=3)

3 - Helper Functions

For begining we need to implement some special functions.

Sigmoid (and it's derivative)

Any layer of a neural network can be considered as an Affine Transformation followed by application of a non linear function. A vector is received as input and is multiplied with a matrix to produce an output , to which a bias vector may be added before passing the result through an activation function such as sigmoid.

The sigmoid function is used quite commonly in the realm of deep learning, at least it was until recently. It has distinct S shape and it is a differentiable real function for any real input value. Additionally, it has a positive derivative at each point. More importantly, we will use it as an activation function for the hidden layer of our model. Here's how it is defined:

Here is first derivative (which we will use during the backpropagation step of our training algorithm). It has the following formula: