must read

Convolutional Neural Networks with TensorFlow

In this tutorial, you'll learn how to construct and implement Convolutional Neural Networks (CNNs) in Python with the TensorFlow framework.

TensorFlow is a popular deep learning framework. In this tutorial, you will learn the basics of this Python library and understand how to implement these deep, feed-forward artificial neural networks with it.

To be precise, you'll be introduced to the following topics in today's tutorial:

- You'll be first introduced to tensors and how they differ from matrices; Once you understand what tensors are then, you'll be introduced to the Tensorflow Framework, within this you will also see that how even a single line of code is implemented via a computational graph in TensorFlow, then you will learn about some of the package's concepts that play a major role in you to do deep learning like constants, variables, and placeholders.

- Then, you'll be headed to the most interesting part of this tutorial. That is the implementation of the Convolutional Neural Network: first, you will try to understand the data. You'll use Python and its libraries to load, explore, and analyze your data. You'll also preprocess your data: you’ll learn how to visualize your images as a matrix, reshape your data and rescale the images between 0 and 1 if required.

- With all of this done, you are ready to construct the deep neural network model. You'll start by defining the network parameters, then learn how to create wrappers to increase the simplicity of your code, define weights and biases, model the network, define loss and optimizer nodes. Once you have all this in place, you are ready for training and testing your model.

- Finally, you will learn to work with your own dataset. In this section, you would download the CIFAR-10 dataset from Kaggle, load the images and labels using Python modules like glob & pandas. You will read the images using OpenCV, one-hot the class labels, visualize the images with labels, normalize the images, and finally split the dataset into train and test set.


In layman's terms, a tensor is a way of representing the data in deep learning. A tensor can be a 1-dimensional, a 2-dimensional, a 3-dimensional array, etc. You can think of a tensor as a multidimensional array. In machine learning and deep learning, you have datasets that are high dimensional, in which each dimension represents a different feature of that dataset.

Consider the following example of a dog versus cat classification problem, where the dataset you're working with has multiple varieties of both cats and dogs images. Now, in order to correctly classify a dog or a cat when given an image, the network has to learn discriminative features like color, face structure, ears, eyes, the shape of the tail, etc.

These features are incorporated by the tensors.

Tip: if you want to learn more about tensors, check out DataCamp's TensorFlow Tutorial for Beginners.

But how are tensors then any different from matrices? You'll find out in the next section!

Tensors versus Matrices: Differences

A matrix is a two-dimensional grid of size $n×m$ that contains numbers: you can add and subtract matrices of the same size, multiply one matrix with another as long as the sizes are compatible $((n×m)×(m×p)=n×p)$, and multiply an entire matrix by a constant.

A vector is a matrix with just one row or column (but see below).

A tensor is often thought of as a generalized matrix. That is, it could be

  • a 1-D matrix, like a vector, which is actually such a tensor,

  • a 3-D matrix (something like a cube of numbers),

  • a 0-D matrix (a single number), or

  • a higher dimensional structure that is harder to visualize.

The dimension of the tensor is called its rank.

Any rank-2 tensor can be represented as a matrix, but not every matrix is a rank-2 tensor. The numerical values of a tensor’s matrix representation depend on what transformation rules have been applied to the entire system.

TensorFlow: Constants, Variables, and Placeholders

TensorFlow is a framework developed by Google on 9th November 2015. It is written in Python, C++, and Cuda. It supports platforms like Linux, Microsoft Windows, macOS, and Android. TensorFlow provides multiple APIs in Python, C++, Java, etc. It is the most widely used API in Python, and you will implement a convolutional neural network using Python API in this tutorial.

The name TensorFlow is derived from the operations, such as adding or multiplying, that artificial neural networks perform on multidimensional data arrays. These arrays are called tensors in this framework, which is slightly different from what you saw earlier.

So why is there a mention of a flow when you're talking about operations?

Let's consider a simple equation and its diagram, represented as a computational graph. Note: don't worry if you don't get this equation straight away, this is just to help you to understand how the flow takes place while using the TensorFlow framework.

prediction = tf.nn.softmax(tf.matmul(W,x) + b)

In TensorFlow, every line of code that you write has to go through a computational graph. As in the above figure, you can see that first $W$ and $x$ get multiplied. Then comes b, which is added to the output of $W$ and $x$. After adding the output of $W$ and $x$ with $b$, a softmax function is applied, and the final output is generated.

You'll find that when you're working with TensorFlow, constants, variables, and placeholders come handy to define the input data, class labels, weights, and biases.

  • Constant takes no input, you use them to store constant values. They produce a constant output that it stores.
import tensorflow as tf
a = tf.constant(2.0)
b = tf.constant(3.0)
c = a * b

Here, nodes a and b are constants that store values 2.0 and 3.0. Node c stores the operation that multiplies the nodes a and b, respectively. When you initialize a session and run c, you'll see that the output that you get back is 6.0:

sess = tf.Session()
  • Placeholders allow you to feed input on the run. Because of this flexibility, placeholders are used, which allows your computational graph to take inputs as parameters. Defining a node as a placeholder assures that node, that it is expected to receive a value later or during runtime. Here, "runtime" means that the input is fed to the placeholder when you run your computational graph.
# Creating placeholders
a = tf.placeholder(tf.float32)
b = tf.placeholder(tf.float32)

# Assigning addition operation w.r.t. a and b to node add
add = a + b

# Create session object
sess = tf.Session()

# Executing add by passing the values [1, 3] [2, 4] for a and b respectively
output = sess.run(add, {a: [1,3], b: [2, 4]})
print('Adding a and b:', output)
print('Datatype:', output.dtype)
Adding a and b: [3. 7.]
Datatype: float32

In this case, you have explicitly provided the data type with tf.float32. Note that this data type is, therefore, a single-precision, which is stored in 32 bits form. However, in cases where you do not do this, just like in the first example, TensorFlow will infer the type of the constant/variable from the initialized value.

Variables allow you to modify the graph such that it can produce new outputs with respect to the same inputs. A variable allows you to add such parameters or nodes to the graph that are trainable. That is, the value can be modified throughout time.

#Variables are defined by providing their initial value and type
variable = tf.Variable([0.9,0.7], dtype = tf.float32)

#variable must be initialized before a graph is used for the first time.
init = tf.global_variables_initializer()

Constants are initialized when you call tf.constant, and their value can never change. But, variables are not initialized when you call tf.Variable. To initialize all the variables in TensorFlow, you need to explicitly call the global variable initializer global_variables_initializer(), which initializes all the existing variables in your TensorFlow code, as you can see in the above code chunk.

Variables survive across multiple executions of a graph, unlike normal tensors that are only instantiated when a graph is run and are immediately deleted afterward.

In this section, you have seen that placeholders are used for holding the input data and class labels, whereas variables are used for weights and biases. Don't worry if you have still not been able to develop proper intuition about how a computational graph works or for what placeholders and variables typically used for in deep learning. You will address all these topics later on in this tutorial.

Convolutional Neural Network (CNN) in TensorFlow

Fashion-MNIST Dataset

Before you go ahead and load in the data, it's good to take a look at what you'll exactly be working with! The Fashion-MNIST dataset contains Zalando's article images, with 28x28 grayscale images of 65,000 fashion products from 10 categories, and 6,500 images per category. The training set has 55,000 images, and the test set has 10,000 images. You can double-check this later when you have loaded in your data! ;)

Fashion-MNIST is similar to the MNIST dataset that you might already know, which you use to classify handwritten digits. That means that the image dimensions, training, and test splits are similar.

Tip: if you want to learn how to implement a Multi-Layer Perceptron (MLP) for classification tasks with this latter dataset, go to this tutorial, or if you want to learn about convolutional neural networks and its implementation in a Keras framework, check out this tutorial.

You can find the Fashion-MNIST dataset here. Unlike the Keras or Scikit-Learn packages, TensorFlow has no predefined module to load the Fashion MNIST dataset, though it has an MNIST dataset by default. To load the data, you first need to download the data from the above link and then structure the data in a particular folder format, as shown below, to be able to work with it. Otherwise, Tensorflow will download and use the original MNIST.

Load the data

You first start with importing all the required modules like NumPy, matplotlib, and, most importantly, Tensorflow.

# Import libraries
import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
%matplotlib inline
import os
# os.environ["CUDA_VISIBLE_DEVICES"]="0" #for training on gpu

After importing all the modules, you will now learn how to load data in TensorFlow, which should be pretty straightforward. The only thing that you should take into account is the one_hot=True argument, which you'll also find in the line of code below: it converts the categorical class labels to binary vectors.

In one-hot encoding, you convert the categorical data into a vector of numbers. You do this because machine learning algorithms can't work with categorical data directly. Instead, you generate one boolean column for each category or class. Only one of these columns could take on the value 1 for each sample. That explains the term "one-hot encoding".

But what does such a one-hot encoded data column look like?

For your problem statement, the one-hot encoding will be a row vector, and for each image, it will have a dimension of 1 x 10. It's important to note here that the vector consists of all zeros except for the class that it represents. There, you'll find a 1. For example, the ankle boot image that you plotted above has a label of 9, so for all the ankle boot images, the one-hot encoding vector would be [0 0 0 0 0 0 0 0 0 1].

Now that all of this is clear, it's time to import the data!

data = input_data.read_data_sets('data/fashion',one_hot=True,\
WARNING:tensorflow:From /Users/adityasharma/Library/Python/3.7/lib/python/site-packages/tensorflow/contrib/learn/python/learn/datasets/base.py:252: _internal_retry.<locals>.wrap.<locals>.wrapped_fn (from tensorflow.contrib.learn.python.learn.datasets.base) is deprecated and will be removed in a future version.
Instructions for updating:
Please use urllib or similar directly.
Successfully downloaded train-images-idx3-ubyte.gz 26421880 bytes.
Extracting data/fashion1/train-images-idx3-ubyte.gz
Successfully downloaded train-labels-idx1-ubyte.gz 29515 bytes.
Extracting data/fashion1/train-labels-idx1-ubyte.gz
Successfully downloaded t10k-images-idx3-ubyte.gz 4422102 bytes.
Extracting data/fashion1/t10k-images-idx3-ubyte.gz
Successfully downloaded t10k-labels-idx1-ubyte.gz 5148 bytes.
Extracting data/fashion1/t10k-labels-idx1-ubyte.gz

Note: If you have trouble loading the Fashion-MNIST dataset with the above method, kindly refer to this repository, which shows a handful of ways in which you can load your dataset.

Once you have the training and testing data loaded, you're all set to analyze the data to get some intuition about the dataset that you are going to work with for this tutorial!

Analyze the Data

Before you start any heavy lifting, it's always a good idea to check out what the images in the dataset look like. First, you can take a programmatical approach and check out their dimensions. Also, take into account that if you want to explore your images, these have already been rescaled between 0 and 1. That means that you would not need to rescale the image pixels again!

# Shapes of training set
print("Training set (images) shape: {shape}".format(shape=data.train.images.shape))
print("Training set (labels) shape: {shape}".format(shape=data.train.labels.shape))

# Shapes of test set
print("Test set (images) shape: {shape}".format(shape=data.test.images.shape))
print("Test set (labels) shape: {shape}".format(shape=data.test.labels.shape))
Training set (images) shape: (55000, 784)
Training set (labels) shape: (55000, 10)
Test set (images) shape: (10000, 784)
Test set (labels) shape: (10000, 10)

From the above output, you can see that the training data has a shape of 55000 x 784: there are 55,000 training samples each of the 784-dimensional vector. Similarly, the test data has a shape of 10000 x 784, since there are 10,000 testing samples.

The 784-dimensional vector is nothing but a 28 x 28-dimensional matrix. That's why you will be reshaping each training and testing sample from a 784-dimensional vector to a 28 x 28 x 1-dimensional matrix in order to feed the samples into the CNN model.

For simplicity, let's create a dictionary that will have class names with their corresponding categorical class labels.

# Create dictionary of target classes
label_dict = {
 0: 'T-shirt/top',
 1: 'Trouser',
 2: 'Pullover',
 3: 'Dress',
 4: 'Coat',
 5: 'Sandal',
 6: 'Shirt',
 7: 'Sneaker',
 8: 'Bag',
 9: 'Ankle boot',

Also, let's take a look at a couple of images in the dataset:


# Display the first image in training data
curr_img = np.reshape(data.train.images[0], (28,28))
curr_lbl = np.argmax(data.train.labels[0,:])
plt.imshow(curr_img, cmap='gray')
plt.title("(Label: " + str(label_dict[curr_lbl]) + ")")

# Display the first image in testing data
curr_img = np.reshape(data.test.images[0], (28,28))
curr_lbl = np.argmax(data.test.labels[0,:])
plt.imshow(curr_img, cmap='gray')
plt.title("(Label: " + str(label_dict[curr_lbl]) + ")")
Text(0.5, 1.0, '(Label: Ankle boot)')

The output of the above two plots is one of the sample images from both training and testing data, and these images are assigned a class label of 4 (Coat) and 9 (Ankle boot). Similarly, other fashion products will have different labels, but similar products will have the same labels. This means that all the 6,500 ankle boot images will have a class label of 9.

Data Preprocessing

The images are of size 28 x 28 (or a 784-dimensional vector).

The images are already rescaled between 0 and 1, so you don't need to rescale them again, but to be sure, let's visualize an image from the training dataset as a matrix. Along with that, let's also print the maximum and minimum value of the matrix.

array([0.40784317, 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.03921569, 0.9568628 ,
       0.8588236 , 0.9803922 , 0.80392164, 0.7803922 , 0.8196079 ,
       0.79215693, 0.8196079 , 0.82745105, 0.7411765 , 0.83921576,
       0.8078432 , 0.8235295 , 0.7843138 , 0.8313726 , 0.6039216 ,
       0.94117653, 0.81568635, 0.8588236 , 0.54901963, 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.08235294, 1.        , 0.8705883 , 0.9333334 ,
       0.72156864, 0.8235295 , 0.75294125, 0.8078432 , 0.8196079 ,
       0.8235295 , 0.7411765 , 0.8352942 , 0.82745105, 0.8196079 ,
       0.75294125, 0.8941177 , 0.60784316, 0.8862746 , 0.9333334 ,
       0.9450981 , 0.6509804 , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.14509805,
       0.9607844 , 0.8862746 , 0.9450981 , 0.5882353 , 0.7725491 ,
       0.7411765 , 0.8000001 , 0.8196079 , 0.8235295 , 0.7176471 ,
       0.8352942 , 0.8352942 , 0.78823537, 0.72156864, 0.8431373 ,
       0.57254905, 0.8470589 , 0.92549026, 0.882353  , 0.6039216 ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.227451  , 0.93725497, 0.89019614,
       1.        , 0.61960787, 0.7568628 , 0.76470596, 0.8000001 ,
       0.8196079 , 0.8352942 , 0.7058824 , 0.8117648 , 0.85098046,
       0.7803922 , 0.7607844 , 0.82745105, 0.61960787, 0.8588236 ,
       0.92549026, 0.8470589 , 0.5921569 , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.26666668, 0.91372555, 0.8862746 , 0.95294124, 0.54509807,
       0.7843138 , 0.7568628 , 0.80392164, 0.8235295 , 0.81568635,
       0.7058824 , 0.80392164, 0.8313726 , 0.7960785 , 0.7686275 ,
       0.8470589 , 0.6156863 , 0.7019608 , 1.        , 0.8470589 ,
       0.60784316, 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.31764707, 0.882353  ,
       0.87843144, 0.82745105, 0.5411765 , 0.8588236 , 0.7254902 ,
       0.78823537, 0.8352942 , 0.8117648 , 0.7725491 , 0.8862746 ,
       0.8313726 , 0.7843138 , 0.74509805, 0.8431373 , 0.7176471 ,
       0.3529412 , 1.        , 0.82745105, 0.5764706 , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.35686275, 0.8235295 , 0.90196085, 0.61960787,
       0.44705886, 0.80392164, 0.73333335, 0.81568635, 0.8196079 ,
       0.8078432 , 0.7568628 , 0.8235295 , 0.82745105, 0.8000001 ,
       0.76470596, 0.8000001 , 0.70980394, 0.09019608, 1.        ,
       0.8352942 , 0.61960787, 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.34117648,
       0.80392164, 0.909804  , 0.427451  , 0.6431373 , 1.        ,
       0.83921576, 0.87843144, 0.8705883 , 0.8235295 , 0.7725491 ,
       0.83921576, 0.882353  , 0.8705883 , 0.82745105, 0.86274517,
       0.85098046, 0.        , 0.9176471 , 0.8470589 , 0.6627451 ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.36078432, 0.8352942 , 0.909804  ,
       0.57254905, 0.01960784, 0.5254902 , 0.5921569 , 0.63529414,
       0.6666667 , 0.7176471 , 0.7137255 , 0.6431373 , 0.6509804 ,
       0.69803923, 0.63529414, 0.6117647 , 0.38431376, 0.        ,
       0.94117653, 0.882353  , 0.8235295 , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.16862746, 0.6431373 , 0.8078432 , 0.5529412 , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.49803925, 0.4901961 ,
       0.29803923, 0.        , 0.        , 0.        ], dtype=float32)

Let us reshape the images so that it's of size 28 x 28 x 1, and feed this as an input to the network.

The reason you need to reshape your data is that Tensorflow expects a certain input shape for its Deep Learning Model, i.e., in this case, a Convolution Neural Network, specifically:

(<number of images>, <image x_dim>, <image y_dim>, <number of channels>)

The dataset class used here apparently yields a flattened (list-like) shape for these images, so the reshape command puts the data structure into a type that the TF class can work with.

# Reshape training and testing image
train_X = data.train.images.reshape(-1, 28, 28, 1)
test_X = data.test.images.reshape(-1,28,28,1)
train_X.shape, test_X.shape
((55000, 28, 28, 1), (10000, 28, 28, 1))

You need not reshape the labels since they already have the correct dimensions, but let us put the training and testing labels in separate variables and also print their respective shapes just to be on the safer side.

train_y = data.train.labels
test_y = data.test.labels
train_y.shape, test_y.shape
((55000, 10), (10000, 10))

The Deep Neural Network

You'll use three convolutional layers:

  • The first layer will have 32-3 x 3 filters,
  • The second layer will have 64-3 x 3 filters and
  • The third layer will have 128-3 x 3 filters.

In addition, there are three max-pooling layers, each of the size 2 x 2.

You start with defining the training iterations training_iters, the learning rate learning_rate, and the batch size batch_size. Keep in mind that all these are hyperparameters and that these don't have fixed values, as these differ for every problem statement.

Nevertheless, here's what you usually can expect:

  • Training iterations indicate the number of times you train your network,
  • It is a good practice to use a learning rate of 1e-3, the learning rate is a factor that is multiplied with the weights based on which the weights get updated, and this indeed helps in reducing the cost/loss/cross-entropy and ultimately in converging or reaching the local optima. The learning rate should neither be too high or too low it should be a balanced rate.
  • The batch size means that your training images will be divided into a fixed batch size, and at every batch, it will take a fixed number of images and train them. It's recommended to use a batch size in the power of 2. Since the number of the physical processor is often a power of 2, using several virtual processors different from a power of 2 leads to poor performance. Also, taking a very large batch size can lead to memory errors, so you have to make sure that the machine you run your code on has sufficient RAM to handle specified batch size.
training_iters = 10
learning_rate = 0.001
batch_size = 128

Network Parameters

Next, you need to define the network parameters. Firstly, you define the number of inputs. This is 784 since the image is initially loaded as a 784-dimensional vector. Later, you will see that how you will reshape the 784-dimensional vector to a 28 x 28 x 1 matrix. Secondly, you'll also define the number of classes, which is nothing else than the number of class labels.

# MNIST data input (img shape: 28*28)
n_input = 28

# MNIST total classes (0-9 digits)
n_classes = 10

Now is the time to use those placeholders, about which you read previously in this tutorial. You will define an input placeholder x, which will have a dimension of None x 784 and the output placeholder with a dimension of None x 10. To reiterate, placeholders allow you to do operations and build your computation graph without feeding in data.

Similarly, y will hold the label of the training images in the form matrix, which will be a None*10 matrix.

The row dimension is None. That's because you have defined batch_size, which tells placeholders that they will receive this dimension when you feed in the data to them. Since you set the batch size to 128, this will be the row dimension of the placeholders.

#both placeholders are of type float
x = tf.placeholder("float", [None, 28,28,1])
y = tf.placeholder("float", [None, n_classes])

Creating wrappers for simplicity

In your network architecture model, you will have multiple convolution and max-pooling layers. In such cases, it's always a better idea to define convolution and max-pooling functions, so that you can call them as many times you want to use them in your network.

  • In the conv2d() function, you pass 4 arguments: input x, weights W, bias b, and strides. This last argument is by default set to 1, but you can always play with it to see how the network performs. The first and last stride must always be 1 because the first is for the image-number, and the last is for the input-channel (since the image is a gray-scale image which has only one channel). After applying the convolution, you will add bias and apply an activation function called Rectified Linear Unit (ReLU).
  • The max-pooling function is simple: it has the input x and a kernel size k, which is set to be 2. This means that the max-pooling filter will be a square matrix with dimensions 2 x 2, and the stride by which the filter will move in is also 2.

You will be padding equal to the same which ensures that while performing the convolution operations, the boundary pixels of the image are not left out, so padding equal to same will basically add zeros at the boundaries of the input and allow the convolution filter to access the boundary pixels as well.

Similarly, max-pooling operation padding equal to the same will add zeros. Later, when you define the weights and the biases, you will notice that input of size 28 x 28 is downsampled to 4 x 4 after applying three max-pooling layers.

def conv2d(x, W, b, strides=1):
    # Conv2D wrapper, with bias and relu activation
    x = tf.nn.conv2d(x, W, strides=[1, strides, strides, 1], padding='SAME')
    x = tf.nn.bias_add(x, b)
    return tf.nn.relu(x)

def maxpool2d(x, k=2):
    return tf.nn.max_pool(x, ksize=[1, k, k, 1], strides=[1, k, k, 1],padding='SAME')

After you have defined the conv2d and maxpool2d wrappers, now you can define your weights and biases variables. So, let's get started!

But first, let's understand each weight and bias parameter step by step. You will create two dictionaries, one for weight and the second for the bias parameter.

  • If you can recall from the above figure that the first convolution layer has 32-3x3 filters, so the first key (wc1) in the weight dictionary has an argument shape that takes a tuple with 4 values: the first and second are the filter size, while the third is the number of channels in the input image and the last represents the number of convolution filters you want in the first convolution layer. The first key in the biases dictionary, bc1, will have 32 bias parameters.
  • Similarly, the second key (wc2) of the weight dictionary has a shape parameter that will take a tuple with 4 values: the first and second again refer to the filter size, and the third represents the number of channels from the previous output. Since you pass 32 convolution filters on the input image, you will have 32 channels as an output from the first convolution layer operation. The last represents the number of filters you want in the second convolution filter. Note that the second key in the biases dictionary, bc2, will have 64 parameters.

You will do the same for the third convolution layer.

  • Now, it's important to understand the fourth key (wd1). After applying 3 convolution and max-pooling operations, you are downsampling the input image from 28 x 28 x 1 to 4 x 4 x 1, and now you need to flatten this downsampled output to feed this as input to the fully connected layer. That's why you do the multiplication operation $44128$, which is the output of the previous layer or number of channels that are outputted by the convolution layer 3. The second element of the tuple that you pass to shape has number of neurons that you want in the fully connected layer. Similarly, in the biases dictionary, the fourth key bd1 has 128 parameters.

You will follow the same logic for the last fully connected layer, in which the number of neurons will be equivalent to the number of classes.

weights = {
    'wc1': tf.get_variable('W0', shape=(3,3,1,32), initializer=tf.contrib.layers.xavier_initializer()),
    'wc2': tf.get_variable('W1', shape=(3,3,32,64), initializer=tf.contrib.layers.xavier_initializer()),
    'wc3': tf.get_variable('W2', shape=(3,3,64,128), initializer=tf.contrib.layers.xavier_initializer()),
    'wd1': tf.get_variable('W3', shape=(4*4*128,128), initializer=tf.contrib.layers.xavier_initializer()),
    'out': tf.get_variable('W6', shape=(128,n_classes), initializer=tf.contrib.layers.xavier_initializer()),
biases = {
    'bc1': tf.get_variable('B0', shape=(32), initializer=tf.contrib.layers.xavier_initializer()),
    'bc2': tf.get_variable('B1', shape=(64), initializer=tf.contrib.layers.xavier_initializer()),
    'bc3': tf.get_variable('B2', shape=(128), initializer=tf.contrib.layers.xavier_initializer()),
    'bd1': tf.get_variable('B3', shape=(128), initializer=tf.contrib.layers.xavier_initializer()),
    'out': tf.get_variable('B4', shape=(10), initializer=tf.contrib.layers.xavier_initializer()),

Now, it's time to define the network architecture! Unfortunately, this is not as simple as you do it in the Keras framework!

The conv_net() function takes 3 arguments as an input: the input x and the weights and biases dictionaries. Again, let's go through the construction of the network step by step:

  • Firstly, you reshape the 784-dimensional input vector to a 28 x 28 x 1 matrix. As you had seen earlier, the images are loaded as a 784-dimensional vector, but you will feed the input to your model as a matrix of size 28 x 28 x 1. The -1 in the reshape() function means that it will infer the first dimension on its own, but the rest of the dimensions are fixed, that is, 28 x 28 x 1.
  • Next, as shown in the figure of the architecture of the model, you will define conv1, which takes input as an image, weights wc1, and biases bc1. Next, you apply max-pooling on the output of conv1, and you will perform a process analogous to this until conv3.
  • Since your task is to classify, given an image, it belongs to which class label. So, after you pass through all the convolution and max-pooling layers, you will flatten the output of conv3. Next, you'll connect the flattened conv3 neurons with each and every neuron in the next layer. Then you will apply activation function on the output of the fully connected layer fc1.

Finally, in the last layer, you will have 10 neurons since you have to classify 10 labels. That means that you will connect all the neurons of fc1 in the output layer with 10 neurons in the last layer.

def conv_net(x, weights, biases):  

    # here we call the conv2d function we had defined above and pass the input image x, weights wc1 and bias bc1.
    conv1 = conv2d(x, weights['wc1'], biases['bc1'])
    # Max Pooling (down-sampling), this chooses the max value from a 2*2 matrix window and outputs a 14*14 matrix.
    conv1 = maxpool2d(conv1, k=2)

    # Convolution Layer
    # here we call the conv2d function we had defined above and pass the input image x, weights wc2 and bias bc2.
    conv2 = conv2d(conv1, weights['wc2'], biases['bc2'])
    # Max Pooling (down-sampling), this chooses the max value from a 2*2 matrix window and outputs a 7*7 matrix.
    conv2 = maxpool2d(conv2, k=2)

    conv3 = conv2d(conv2, weights['wc3'], biases['bc3'])
    # Max Pooling (down-sampling), this chooses the max value from a 2*2 matrix window and outputs a 4*4.
    conv3 = maxpool2d(conv3, k=2)

    # Fully connected layer
    # Reshape conv2 output to fit fully connected layer input
    fc1 = tf.reshape(conv3, [-1, weights['wd1'].get_shape().as_list()[0]])
    fc1 = tf.add(tf.matmul(fc1, weights['wd1']), biases['bd1'])
    fc1 = tf.nn.relu(fc1)
    # Output, class prediction
    # finally we multiply the fully connected layer with the weights and add a bias term.
    out = tf.add(tf.matmul(fc1, weights['out']), biases['out'])
    return out

Loss and Optimizer Nodes

You will start with constructing a model and call the conv_net() function by passing in input x, weights, and biases. Since this is a multi-class classification problem, you will use softmax activation on the output layer. This will give you probabilities for each class label. The loss function you use is cross-entropy.

The reason you use cross-entropy as a loss function is that the cross-entropy function's value is always positive, and tends toward zero as the neuron gets better at computing the desired output, y, for all training inputs, x. These are both properties you would intuitively expect for a cost function. It avoids the problem of learning to slow down, which means that if the weights and biases are initialized in a wrong fashion, it helps in recovering faster and does not hamper much of the training phase.

In TensorFlow, you define both the activation and the cross-entropy loss functions in one line. You pass two parameters, which are the predicted output and the ground truth label y. You will then take the mean (reduce_mean), which will compute the mean loss over all instances in a single batch and not the average over all the batches since you will be training your model in a mini-batch fashion.

Next, you define one of the most popular optimization algorithms: Adam optimizer. You can read more about the optimizer from here, and you specify the learning rate by explicitly stating how to minimize the cost you had calculated in the previous step.

pred = conv_net(x, weights, biases)

cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=pred, labels=y))

optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost)
WARNING:tensorflow:From <ipython-input-23-989f812044df>:3: softmax_cross_entropy_with_logits (from tensorflow.python.ops.nn_ops) is deprecated and will be removed in a future version.
Instructions for updating:

Future major versions of TensorFlow will allow gradients to flow
into the labels input on backprop by default.

See `tf.nn.softmax_cross_entropy_with_logits_v2`.

Evaluate Model Node

To test your model, let's define two more nodes: correct_prediction and accuracy. It will evaluate your model after every training iteration, which will help you keep track of your model's performance. After every iteration, the model is tested on the 10,000 testing images, which will not be seen in the training phase.

You can always save the graph and run the testing part later as well. But for now, you will test within the session.

#Here, you check whether the index of the maximum value of the predicted image is equal to the actual labeled image. And both will be a column vector.
correct_prediction = tf.equal(tf.argmax(pred, 1), tf.argmax(y, 1))

#calculate accuracy across all the given images and average them out.
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

Remember that your weights and biases are variables and that you have to initialize them before you can make use of them. So let's do that with the following line of code:

# Initializing the variables
init = tf.global_variables_initializer()

Training and Testing the Model

When you train and test your model in TensorFlow, you go through the following steps:

  • You start by launching the graph. This is a class that runs all the TensorFlow operations and launches the graph in a session. All the operations have to be within the indentation.
  • Then, you run the session, which will execute the initialized variables in the previous step and evaluate the tensor.
  • Next, you define a for loop that runs for the number of training iterations you had specified in the beginning.

  • Right after that, you'll initiate a second for loop, which is for the number of batches that you will have based on the batch size you chose, so you divide the total number of images by the batch size.

  • You will then input the images based on the batch size you pass in batch_x and their respective labels in batch_y.
  • Now is the most important step. Just like you ran the initializer after creating the graph, now you feed the placeholders x and y the actual data in a dictionary and run the session by passing the cost and the accuracy that you had defined earlier. It returns the loss (cost) and accuracy.
  • You can print the loss and training accuracy after each epoch (training iteration) is completed.
  • After each training iteration is completed, you run only the accuracy by passing all of the 10000 test images and labels. This will give you an idea of how accurately your model is performing while it is training.

It's usually recommended to do the testing once your model is trained completely and validate only while it is in the training phase after each epoch. However, let's stick with this approach for now.

with tf.Session() as sess:
    train_loss = []
    test_loss = []
    train_accuracy = []
    test_accuracy = []
    summary_writer = tf.summary.FileWriter('./Output', sess.graph)
    for i in range(training_iters):
        for batch in range(len(train_X)//batch_size):
            batch_x = train_X[batch*batch_size:min((batch+1)*batch_size,len(train_X))]
            batch_y = train_y[batch*batch_size:min((batch+1)*batch_size,len(train_y))]    
            # Run optimization op (backprop).
                # Calculate batch loss and accuracy
            opt = sess.run(optimizer, feed_dict={x: batch_x,
                                                              y: batch_y})
            loss, acc = sess.run([cost, accuracy], feed_dict={x: batch_x,
                                                              y: batch_y})
        print("Iter " + str(i) + ", Loss= " + \
                      "{:.6f}".format(loss) + ", Training Accuracy= " + \
        print("Optimization Finished!")

        # Calculate accuracy for all 10000 mnist test images
        test_acc,valid_loss = sess.run([accuracy,cost], feed_dict={x: test_X,y : test_y})
        print("Testing Accuracy:","{:.5f}".format(test_acc))
Iter 0, Loss= 0.383201, Training Accuracy= 0.84375
Optimization Finished!
Testing Accuracy: 0.83500
Iter 1, Loss= 0.205107, Training Accuracy= 0.92969
Optimization Finished!
Testing Accuracy: 0.87630
Iter 2, Loss= 0.163720, Training Accuracy= 0.96094
Optimization Finished!
Testing Accuracy: 0.88950
Iter 3, Loss= 0.135824, Training Accuracy= 0.96875
Optimization Finished!
Testing Accuracy: 0.89450
Iter 4, Loss= 0.120255, Training Accuracy= 0.97656
Optimization Finished!
Testing Accuracy: 0.90190
Iter 5, Loss= 0.116372, Training Accuracy= 0.97656
Optimization Finished!
Testing Accuracy: 0.90210
Iter 6, Loss= 0.114322, Training Accuracy= 0.95312
Optimization Finished!
Testing Accuracy: 0.90260
Iter 7, Loss= 0.095541, Training Accuracy= 0.97656
Optimization Finished!
Testing Accuracy: 0.90110
Iter 8, Loss= 0.094024, Training Accuracy= 0.96875
Optimization Finished!
Testing Accuracy: 0.90060
Iter 9, Loss= 0.079477, Training Accuracy= 0.98438
Optimization Finished!
Testing Accuracy: 0.90130

The test accuracy looks impressive. It turns out that your classifier does better than the benchmark that was reported here, which is an SVM classifier with a mean accuracy of 0.897. Also, the model does well compared to some of the deep learning models mentioned on the GitHub profile of the creators of the fashion-MNIST dataset.

However, you saw that the model was overfitting since the training accuracy is more than the testing accuracy. Are these results all that good?

Let's put your model evaluation into perspective and plot the accuracy and loss plots between training and validation data:

plt.plot(range(len(train_loss)), train_loss, 'b', label='Training loss')
plt.plot(range(len(train_loss)), test_loss, 'r', label='Test loss')
plt.title('Training and Test loss')
plt.xlabel('Epochs ',fontsize=16)
<Figure size 432x288 with 0 Axes>
plt.plot(range(len(train_loss)), train_accuracy, 'b', label='Training Accuracy')
plt.plot(range(len(train_loss)), test_accuracy, 'r', label='Test Accuracy')
plt.title('Training and Test Accuracy')
plt.xlabel('Epochs ',fontsize=16)
<Figure size 432x288 with 0 Axes>

From the above two plots, you can see that the test accuracy almost became stagnant after 8 epochs and rarely increased at certain epochs. In the beginning, the testing accuracy was linearly increasing with loss, but then it did not increase much.

The testing loss shows that this is the sign of overfitting. Similar to training accuracy, it linearly decreased, but after 5 epochs, it started to increase. This means that the model tried to memorize the data and succeeded.

This was it for this tutorial, but there is a task for you all:

  • Your task is to reduce the overfitting of the above model by introducing the dropout technique. For simplicity, you may like to follow along with the tutorial Convolutional Neural Networks in Python with Keras, even though it is in keras. However, still, the accuracy and loss heuristics are pretty much the same. So, following along with this tutorial will help you to add dropout layers in your current model since both of the tutorials have exactly similar architecture.
  • Secondly, try to improve the testing accuracy by deepening the network a bit, adding learning rate decay for faster convergence, or trying to play with the optimizer and so on!

Processing your own Data

You would be working with the CIFAR-10 data, which consists of 60,000 32x32 color images in 10 classes, with 6000 images per class. There are 50,000 training images and 10,000 test images in the official data.

The dataset can be downloaded from the kaggle website. It would be in .7z format, which you would need to uncompress, and finally, you will have the .png image files in the folder.

If you are using a Macbook, you can install p7zip using brew install p7zip, and once its installed, run 7z x train.7z. This will create a train folder which will have 50,000 .png images.

The data was split in train/test from the original dataset, hence, you can download the files accordingly. For now, you will only download the train.7z folder.

The dataset consists of following classes all being mutually exclusive:

  • airplane
  • automobile
  • bird
  • cat
  • deer
  • dog
  • frog
  • horse
  • ship
  • truck

The first step is to load the train folder using Python's built-in glob module and then read the labels.csv using the Pandas library.

import glob
import pandas as pd
imgs = []
label = []
data = glob.glob('train/*')
labels_main = pd.read_csv('trainLabels.csv')
id label
0 1 frog
1 2 truck
2 3 truck
3 4 deer
4 5 automobile

You only need the second column (label) from the labels_main data frame, which can be accessed using the Pandas .iloc function, once you have the second column just convert it into a list using .tolist().

labels = labels_main.iloc[:,1].tolist()

Next, you need to create a dictionary that will map your categorical string into an integer value. Then you will use list comprehension and apply the mapping on the labels list that you created above.

Finally, you will convert these integer values into one-hot encoding values using the to_categorical function.

conversion = {'airplane':0,'automobile':1,'bird':2,'cat':3, 'deer':4, 'dog':5, 'frog':6,\
              'horse':7, 'ship':8, 'truck':9}
num_labels = []
num_labels.append([conversion[item] for item in labels])
num_labels = np.array(num_labels)
array([[6, 9, 9, ..., 9, 1, 1]])
from keras.utils import to_categorical
Using TensorFlow backend.
label_one = to_categorical(num_labels)
label_one = label_one.reshape(-1,10)
(50000, 10)

Now you will read the images from the train folder by looping one-by-one using OpenCV and store them in a list, and finally, you will convert that list into a NumPy array. The shape of your final output should be (50000, 32, 32, 3).

import cv2
for i in data:
    img = cv2.imread(i)
    if img is not None:
train_imgs = np.array(imgs)
(50000, 32, 32, 3)

Let's visualize a couple of images from the training dataset. Note that the class labels and image semantics should be in sync, which should also act as a validation that the data preprocessing was done correctly.


# Display the first image in training data
curr_img = np.reshape(train_imgs[0], (32,32,3))
curr_lbl = labels_main.iloc[0,1]
plt.title("(Label: " + str(curr_lbl) + ")")

# Display the second image in training data
curr_img = np.reshape(train_imgs[1], (32,32,3))
curr_lbl = labels_main.iloc[1,1]
plt.title("(Label: " + str(curr_lbl) + ")")
Text(0.5, 1.0, '(Label: truck)')

As a final step, you would:

  • Normalize your images between 0 and 1 before you feed them into the convolution neural network.
  • Split the 50,000 images into training & testing images with a 20% split, which means the model will be trained on 40,000 images and tested on 10,000 images.
train_images = train_imgs / np.max(train_imgs)
np.max(train_images), np.min(train_images)
(1.0, 0.0)
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(train_images, label_one, test_size=0.2, random_state=42)
train_X = X_train.reshape(-1, 32, 32, 3)
test_X = X_test.reshape(-1, 32, 32, 3)

Now, you are all set to feed the data into the Convolution Neural Network you created and trained above. But you would have to make slight modifications before you can start training the model and finally test it. This modification would be a good exercise for you to learn and understand how the dimensions of the parameters as well as the overall architecture changes when your input & output is varied.

Go Further and Master Deep Learning with TensorFlow!

This tutorial was a good start to understanding how TensorFlow works underneath the hood, along with an implementation of convolutional neural networks in Python.

If you were able to follow along easily, well done! Try doing some experiments with the same model architecture but using different types of public datasets available. You could also try playing with different weight initializers, maybe deepen the network architecture, change the learning rate, etc. and see how your network performs by changing these parameters. But try changing them one at a time only. Then, you will get more intuition about these parameters and will not get confused; that's what is called Ablation Study!

There is still a lot to cover, so why not take DataCamp’s Deep Learning in Python course? Also make sure to check out the TensorFlow documentation, if you haven’t done so already. You will find more examples and information on all functions, arguments, more layers, etc. It will undoubtedly be an indispensable resource when you’re learning how to work with neural networks in Python!