TL;DR Build and train your first Neural Network model using TensorFlow 2. Use the model to recognize clothing type from images.
Ok, I’ll start with a secret — I am THE fashion wizard (as long as we’re talking tracksuits). Fortunately, there are ways to get help, even for someone like me!
Can you imagine a really helpful browser extension for “fashion accessibility”? Something that tells you what the type of clothing you’re looking at.
After all, I really need something like this. I found out nothing like this exists, without even searching for it. Let’s make a Neural Network that predicts clothing type from an image!
Here’s what we are going to do:
- Install TensorFlow 2
- Take a look at some fashion data
- Transform the data, so it is useful for us
- Create your first Neural Network in TensorFlow 2
- Predict what type of clothing is showing on images your Neural Network haven’t seen
With TensorFlow 2 just around the corner (not sure how far along that corner is thought) making your first Neural Network has never been easier (as far as TensorFlow goes).
But what is TensorFlow? Machine Learning platform (really Google?) created and open sourced by Google. Note that TensorFlow is not a special purpose library for creating Neural Networks, although it is primarily used for that purpose.
So, what TensorFlow 2 has in store for us?
TensorFlow 2.0 focuses on simplicity and ease of use, with updates like eager execution, intuitive higher-level APIs, and flexible model building on any platform
Alright, let’s check those claims and install TensorFlow 2 from your terminal:
pip install tensorflow-gpu==2.0.0-alpha0
Your Neural Network needs something to learn from. In Machine Learning that something is called datasets. The dataset for today is called Fashion MNIST.
Fashion-MNISTis a dataset of Zalando’s article images — consisting of a training set of 60,000 examples and a test set of 10,000 examples. Each example is a 28x28 grayscale image, associated with a label from 10 classes.
In other words, we have 70,000 images of 28 pixels width and 28 pixels height in greyscale. Each image is showing one of 10 possible clothing types. Here is one:
Here are some images from the dataset along with the clothing they are showing:
Here are all different types of clothing:
Now that we got familiar with the data we have let’s make it usable for our Neural Network.
Let’s start with loading our data into memory:
import tensorflow as tf from tensorflow import keras (x_train, y_train), (x_val, y_val) = keras.datasets.fashion_mnist.load_data()
Fortunately, TensorFlow has the dataset built-in, so we can easily obtain it.
Loading it gives us 4 things:
x_train — image (pixel) data for 60,000 clothes. Used for training our model.
y_train — classes (clothing type) for the clothing above. Used for training our model.
x_val — image (pixel) data for 10,000 clothes. Used for testing/validating our model.
y_val — classes (clothing type) for the clothing above. Used for testing/validating our model.
Now, your Neural Network can’t really see images as you do. But it can understand numbers. Each data point of each image in our dataset is pixel data — a number between 0 and 255. We would like that data to be transformed (Why? While the truth is more nuanced, one can say it helps with training a better model) in the range 0–1. How can we do it?
We will use the Dataset from TensorFlow to prepare our data:
def preprocess(x, y): x = tf.cast(x, tf.float32) / 255.0 y = tf.cast(y, tf.int64) return x, y def create_dataset(xs, ys, n_classes=10): ys = tf.one_hot(ys, depth=n_classes) return tf.data.Dataset.from_tensor_slices((xs, ys)) \ .map(preprocess) \ .shuffle(len(ys)) \ .batch(128)
Let’s unpack what is happening here. What does tf.one_hot do? Let’s say you have the following vector:
[1, 2, 3, 1]
Here is the one-hot encoded version of it:
[ [1, 0, 0], [0, 1, 0], [0, 0, 1], [1, 0, 0] ]
It puts 1 at the index position of the number and 0 everywhere else.
We create Dataset from the data using fromtensorslices and divide each pixel of the images by 255 to scale it in the 0–1 range.
Why shuffle the data, though? We don’t want our model to make predictions based on the order of the training data, so we just shuffle it.
I am truly sorry for that bad joke:
You’re doing great! It is time for the fun part, use the data to create your first Neural Network.
train_dataset = create_dataset(x_train, y_train) val_dataset = create_dataset(x_val, y_val)
They say TensorFlow 2 has an easy High-level API, let’s take it for a spin:
model = keras.Sequential([ keras.layers.Reshape( target_shape=(28 * 28,), input_shape=(28, 28) ), keras.layers.Dense( units=256, activation='relu' ), keras.layers.Dense( units=192, activation='relu' ), keras.layers.Dense( units=128, activation='relu' ), keras.layers.Dense( units=10, activation='softmax' ) ])
Turns out the High-level API is the old Keras API which is great.
Most Neural Networks are built by “stacking” layers. Think pancakes or lasagna. Your first Neural Network is really simple. It has 5 layers.
The first (Reshape) layer is called an input layer and takes care of converting the input data for the layers below. Our images are
28*28=784 pixels. We’re just converting the 2D
28x28 array to a 1D
All other layers are Dense (interconnected). You might notice the parameter units, it sets the number of neurons for each layer. The activation parameter specifies a function that decides whether “the opinion” of a particular neuron, in the layer, should be taken into account and to what degree. There are a lot of activation functions one can use.
The last (output) layer is a special one. It has 10 neurons because we have 10 different types of clothing in our data. You get the predictions of the model from this layer.
Right now your Neural Network is plain dumb. It is like a shell without a soul (good that you get that). Let’s train it using our data:
model.compile( optimizer='adam', loss=tf.losses.CategoricalCrossentropy(from_logits=True), metrics=['accuracy'] ) history = model.fit( train_dataset.repeat(), epochs=10, steps_per_epoch=500, validation_data=val_dataset.repeat(), validation_steps=2 )
Training a Neural Network consists of deciding on objective measurement of accuracy and an algorithm that knows how to improve on that.
TensorFlow allows us to specify the optimizer algorithm we’re going to use — Adam and the measurement (loss function) — CategoricalCrossentropy (we’re choosing/classifying 10 different types of clothing). We’re measuring the accuracy of the model during the training, too!
The actual training takes place when the fit method is called. We give our training and validation data to it and specify how many epochs we’re training for. During one training epoch, all data is shown to the model.
Here is a sample result of our training:
Epoch 1/10 500/500 [==============================] - 9s 18ms/step - loss: 1.7340 - accuracy: 0.7303 - val_loss: 1.6871 - val_accuracy: 0.7812 Epoch 2/10 500/500 [==============================] - 6s 12ms/step - loss: 1.6806 - accuracy: 0.7807 - val_loss: 1.6795 - val_accuracy: 0.7812 ...
I got ~82% accuracy on the validation set after 10 epochs. Lets profit from our model!
Now that your Neural Network “learned” something lets try it out:
predictions = model.predict(val_dataset)
Here is a sample prediction:
array([ 1.8154810e-07, 1.0657334e-09, 9.9998713e-01, 1.1928002e-05, 2.9766360e-08, 4.0670972e-08, 2.5100772e-07, 4.5147233e-11, 2.9812568e-07, 3.5224868e-11 ], dtype=float32)
Recall that we have 10 different clothing types. Our model outputs a probability distribution about how likely each clothing type is shown on an image. To make a decision, we can get the one with the highest probability:
Here is one correct and one wrong prediction from our model:
Alright, you got your first Neural Network running and made some predictions! You can take a look at the Google Colaboratory Notebook (including more charts) here:
One day you might realize that your relationship with Machine Learning is similar to marriage. The problems you might encounter are similar, too! What Makes Marriages Work by John Gottman, Nan Silver lists 5 problems marriages have: “Money, Kids, Sex, Time, Others”. Here are the Machine Learning counterparts:
Shall we tackle them together?