Neural Network
Welcome to your next lab! You will solve problem of handwritten digits recognition using Neural Network.
You will learn to:
- Build the general architecture of a learning algorithm with OOP in mind:
- Helper utilities
- Sigmoid (and it's derivative)
- One-Hot
- Cost Function
- Regularization
- Neural Network Class
- Forward propagation
- Backward propagation
- Upgrade parameters
- Main Model Classes
- Training
- Prediction
- Helper utilities
0 - Download data
!pip install wget
import wget
wget.download('https://dru.fra1.digitaloceanspaces.com/DS_Fundamentals/datasets/04_supervised_learning/Neural_Network/train-images-idx3-ubyte')
wget.download('https://dru.fra1.digitaloceanspaces.com/DS_Fundamentals/datasets/04_supervised_learning/Neural_Network/train-labels-idx1-ubyte')1 - Packages
First, let's run the cell below to import all the packages that you will need during this assignment.
- numpy is the fundamental package for scientific computing with Python.
- matplotlib is a famous library to plot graphs in Python.
- seaborn is a Python visualization library which provides a high-level interface for drawing attractive statistical graphics.
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
sns.set(style='whitegrid', palette='muted', font_scale=1.5)2 - Overview of the Problem set
Problem Statement: We'll use the MNIST data set, which contains tens of thousands of scanned images of handwritten digits, together with their correct classifications. MNIST's name comes from the fact that it is a modified subset of two data sets collected by NIST, the United States' National Institute of Standards and Technology.
The MNIST dataset contains 60,000 images. These images are scanned handwriting samples from 250 people, half of whom were US Census Bureau employees, and half of whom were high school students. The images are greyscale and 28 by 28 pixels in size. So, you are given a dataset containing:
- a training set of
m_trainexamples labeld as 0-9 - a test set of
m_testexamples labeld as 0-9 - each example is an array of length 784 (28 * 28) which represents image of handwritten digit.
You will build an algorithm that can recognize handwritten digits.
Let's get more familiar with the dataset. Load the data by running the following code.
# Read the MNIST dataset from ubyte files
def read_mnist(images_path, labels_path):
import struct
import os
with open(labels_path, 'rb') as p:
# Read the magic number and number of labels
magic, n = struct.unpack('>II', p.read(8))
# Read the labels
labels = np.fromfile(p, dtype=np.uint8)
with open(images_path, 'rb') as p:
# Read the magic number, number of images, rows, and columns
magic, num, rows, cols = struct.unpack(">IIII", p.read(16))
# Read the images and reshape them to (number of images, 784)
images = np.fromfile(p, dtype=np.uint8).reshape(len(labels), 784)
return images, labels
# Shuffle dataset
def shuffle_data(features, labels, random_seed=42):
# Ensure the number of features matches the number of labels
assert len(features) == len(labels)
if random_seed:
# Set the random seed for reproducibility
np.random.seed(random_seed)
# Generate a random permutation of indices
idx = np.random.permutation(len(features))
# Shuffle features and labels using the generated indices
return [a[idx] for a in [features, labels]]
# Loading data
def load_data():
# Read the MNIST data
X, y = read_mnist('train-images-idx3-ubyte', 'train-labels-idx1-ubyte')
# Shuffle the data
X, y = shuffle_data(X, y, random_seed=42)
# Split the data into training and test sets
train_set_x, train_set_y = X[:5000], y[:5000]
test_set_x, test_set_y = X[5000:], y[5000:]
# Reshape the data for compatibility with the model
test_set_x = test_set_x.reshape(test_set_x.shape[0], -1).T
train_set_x = train_set_x.reshape(train_set_x.shape[0], -1).T
train_set_y = train_set_y.reshape((1, train_set_y.shape[0]))
test_set_y = test_set_y.reshape((1, test_set_y.shape[0]))
return train_set_x, test_set_x, train_set_y, test_set_yLet's create train and test datasets:
train_set_x, test_set_x, train_set_y, test_set_y = load_data()
print('train set shapes: ', train_set_x.shape, train_set_y.shape)
print('test set shapes: ', test_set_x.shape, test_set_y.shape)Expected Output:
| train set shapes: | (784, 5000) (1, 5000) |
| test set shapes: | (784, 55000) (1, 55000) |
Data exploration
Let's build a function to check how the data looks like:
def plot_digit(x_set, y_set, idx):
img = x_set.T[idx].reshape(28,28)
plt.imshow(img, cmap='Greys', interpolation='nearest')
plt.title('true label: %d' % y_set.T[idx])
plt.show()plot_digit(train_set_x, train_set_y, idx=1)plot_digit(train_set_x, train_set_y, idx=3)3 - Helper Functions
For begining we need to implement some special functions.
Sigmoid (and it's derivative)
Any layer of a neural network can be considered as an Affine Transformation followed by application of a non linear function. A vector is received as input and is multiplied with a matrix to produce an output , to which a bias vector may be added before passing the result through an activation function such as sigmoid.
The sigmoid function is used quite commonly in the realm of deep learning, at least it was until recently. It has distinct S shape and it is a differentiable real function for any real input value. Additionally, it has a positive derivative at each point. More importantly, we will use it as an activation function for the hidden layer of our model. Here's how it is defined:
Here is first derivative (which we will use during the backpropagation step of our training algorithm). It has the following formula: