Author: Paul A. Beata
GitHub: pbeata
The data used for this project is from the MNIST set of handwritten digits available in the TensorFlow datasets module: Link to TensorFlow Site
This project uses the data set called MNIST for a deep learning application of handwritten digit recognition. The data set provides 70,000 images (each of size 28x28 pixels, no color) of handwritten digits with 1 digit per image.
The goal of this project is to write an algorithm that detects which digit is written in each input image. Since there are only 10 possible digits {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}, this is a classification task with 10 classes.
For this deep learning application, we will employ TensorFlow to make use of the Keras neural networks methods.
import numpy as np
import tensorflow as tf
%config Completer.use_jedi = False
The data set for this application comes from the "tensorflow-datasets" module; therefore, if you do not have this installed in your system, you must first run one of the following install commands:
# TensorFlow includes the data provided for the MNIST application that we'll use.
# It comes with the tensorflow-datasets module, therefore, if you haven't yet, please install the package using:
# option 1 (uncomment and run the next line)
# pip install tensorflow-datasets
# OR
# option 2 (uncomment and run the next line)
# conda install tensorflow-datasets
import tensorflow_datasets as tfds
These datasets will be stored in the path "C:\Users*USERNAME*\tensorflow_datasets\" (on Windows). The first time you download a dataset, it is stored in its respective folder, but every other time after that it will be automatically loaded from the copy on your machine.
Here we will perform two main tasks on the MNIST number recognition dataset:
The method tfds.load actually loads a dataset (or downloads and then loads it if it's the first time we are using this data from the tensorflow-datasets modeuls. In this case, we are interesteed in the MNIST data set.
# download or load the MNIST data set from TensorFlow
# mnist_dataset = tfds.load(name='mnist', as_supervised=True)
mnist_dataset, mnist_info = tfds.load(name='mnist', with_info=True, as_supervised=True)
Using the optional input "as_supervised=True" will load the data set in a 2-tuple structure: (input, target). This separates the input data from the target data for us automatically and is simply a matter of convenience.
# once we have loaded the dataset, we can easily extract the training and testing dataset with the built-in references
mnist_train = mnist_dataset['train']
mnist_test = mnist_dataset['test']
The data set was originally split into TRAIN and TEST subsets already. First, we can check the percentages of each split here using the "mnist_info" data structure to extract info about the loaded data.
# get the number of train and test samples from the data set
num_train = mnist_info.splits['train'].num_examples
num_test = mnist_info.splits['test'].num_examples
num_total = num_train + num_test
# check the train-test split percentage
percent_train = 100 * num_train / num_total
percent_test = 100 * num_test / num_total
# round the percentages and display them here:
percent_train = np.round(percent_train, 1)
percent_test = np.round(percent_test, 1)
print("Training Data: {p}%".format(p=percent_train))
print("Testing Data: {p}%".format(p=percent_test))
Training Data: 85.7% Testing Data: 14.3%
By default, the MNIST data set in TensorFlow has pre-labeled "train" and "test" datasets, but there is no validation subset. Therefore, we will split it on our own here in order to generate a validation subset from the training data.
# specify the fraction of training data that will be reserved for validation (we use 10% here)
validation_fraction = 0.1
num_valid = validation_fraction * num_train
# make sure that the number of validation samples is stored as an integer:
# num_valid = tf.cast(num_valid, tf.int64)
num_valid = int(num_valid)
num_valid
6000
Note: we will perform the actual split after scaling next. This step was only to prepare the relative proportions of each of the data subsets.
Here we will scale the data in order to make the results more numerically stable. In this case, using image data with pixel values between [0, 255], we will simply normalize each image's pixels such that all the inputs are between 0 and 1 instead.
# custom scaling function that will take an image from MNIST and normalize the pixel value range to [0,1]
def scale(image, label):
# we make sure the value is cast to a float first
image = tf.cast(image, tf.float32)
# since the possible values for the inputs are 0 to 255 (256 different shades of gray)
# if we divide each element by 255, we would get the desired result:
# --> all elements will be between 0 and 1
image /= 255.0
return image, label
# the method .map() allows us to apply this custom transformation ("scale") to our data set
scaled_train_data = mnist_train.map(scale)
scaled_test_data = mnist_test.map(scale)
We use the BUFFER_SIZE parameter here for cases when we're dealing with very large data sets. In those cases, we can't shuffle the whole data set all at once because we can't fit it all in memory. Thus, TensorFlow only stores BUFFER_SIZE number of samples in memory at a time and shuffles them. Note: if BUFFER_SIZE=1, then no shuffling will actually happen and if BUFFER_SIZE >= num samples, then shuffling is uniform across the data set. Choosing a BUFFER_SIZE in between is a computational optimization to approximate uniform shuffling. There is a shuffle method already available and we just need to specify the buffer size to use it here.
Shuffle
# choose the buffer size for shuffling the data: 10% of the training data set size
buffer_fraction = 0.10
BUFFER_SIZE = int(buffer_fraction * num_train)
BUFFER_SIZE
6000
# shuffle the scaled training data
shuffled_scaled_train_data = scaled_train_data.shuffle(BUFFER_SIZE)
Split
Our validation data will be equal to 10% of the training set, which we've already calculated above.
# we use the .take() method to take as many samples as we need for the validation set
validation_data = shuffled_scaled_train_data.take(num_valid)
# Similarly, the training data will be everything else leftover:
# so, we skip as many samples as there are in the validation set.
train_data = shuffled_scaled_train_data.skip(num_valid)
We can also take this opportunity to batch the training data. This is helpful during training because we will be able to iterate over the different batches in the training data set.
# specify the batch size
BATCH_SIZE = 100
# batch the training data
train_data = train_data.batch(BATCH_SIZE)
# NOTE: for the validation and testing data, we will simply use the number of samples as the batch size for each
validation_data = validation_data.batch(num_valid)
test_data = scaled_test_data.batch(num_test)
# takes next batch (it is the only batch)
# because as_supervized=True, we've got a 2-tuple structure
validation_inputs, validation_targets = next(iter(validation_data))
First, we will outline the model to set up ML algorithm for training.
# check the features from our mnist_info data structure to understand the input and output we are dealing with:
mnist_info.features.keys
<bound method FeaturesDict.keys of FeaturesDict({ 'image': Image(shape=(28, 28, 1), dtype=tf.uint8), 'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=10), })>
The MNIST data set contains thousands of images comprised of pixels which define each handwritten digit. We can see here that the "image" object is a 28x28 tensor and the output ("label") has 10 classes. In other words, the input size for each image in the set is a collection of 28*28 = 784 pixels and there the output size is 10 for the 10 possible digits to be predicted by the algorith: {0, 1, ..., 9}.
# we can define the input and output size of our neural net using the info above from mnist_info:
input_size = (28, 28, 1) # shape of each "image"
output_size = 10 # num_classes for the "label"
One of the hyperparameters that we can optimize for the neueral network is the size of the hidden layers. In this application, we will use a single size for all the layers. However, since we don't know the optimal hidden layer size, we will perform an optimization to try and find the best one.
# early stopping criteria: increase the patience a little
early_stopping = tf.keras.callbacks.EarlyStopping(patience=2)
Model Definition
# choose a hidden layer size (we know from previous work that a width of 50 works well)
width = 50
# define each layer of the NN model
model = tf.keras.Sequential([
tf.keras.layers.Flatten(input_shape=input_size), # input layer
tf.keras.layers.Dense(width, activation='relu'), # 1st hidden layer
tf.keras.layers.Dense(width, activation='relu'), # 2nd hidden layer
# we just make sure to activate the output layer it with softmax
tf.keras.layers.Dense(output_size, activation='softmax') # output layer
])
Choose the optimizer and the loss function for the model:
# Here we define the optimizer we'd like to use, the loss function,
# and the metrics we are interested in obtaining at each iteration:
model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
Training
This is where we train the first model we have built.
# determine the maximum number of epochs
NUM_EPOCHS = 20
# we fit the model, specifying the:
# 1. training data,
# 2. the total number of epochs,
# 3. and the validation data we just created ourselves in the format: (inputs,targets)
model.fit(train_data,
epochs=NUM_EPOCHS,
callbacks=early_stopping,
validation_data=(validation_inputs, validation_targets),
verbose=2)
Epoch 1/20 540/540 - 12s - loss: 0.4172 - accuracy: 0.8823 - val_loss: 0.2410 - val_accuracy: 0.9297 Epoch 2/20 540/540 - 9s - loss: 0.1923 - accuracy: 0.9435 - val_loss: 0.1746 - val_accuracy: 0.9493 Epoch 3/20 540/540 - 13s - loss: 0.1451 - accuracy: 0.9575 - val_loss: 0.1496 - val_accuracy: 0.9587 Epoch 4/20 540/540 - 13s - loss: 0.1181 - accuracy: 0.9647 - val_loss: 0.1267 - val_accuracy: 0.9653 Epoch 5/20 540/540 - 13s - loss: 0.1006 - accuracy: 0.9700 - val_loss: 0.1162 - val_accuracy: 0.9655 Epoch 6/20 540/540 - 13s - loss: 0.0872 - accuracy: 0.9742 - val_loss: 0.0981 - val_accuracy: 0.9708 Epoch 7/20 540/540 - 12s - loss: 0.0758 - accuracy: 0.9771 - val_loss: 0.0890 - val_accuracy: 0.9730 Epoch 8/20 540/540 - 13s - loss: 0.0676 - accuracy: 0.9801 - val_loss: 0.0868 - val_accuracy: 0.9718 Epoch 9/20 540/540 - 13s - loss: 0.0612 - accuracy: 0.9810 - val_loss: 0.0818 - val_accuracy: 0.9758 Epoch 10/20 540/540 - 13s - loss: 0.0530 - accuracy: 0.9838 - val_loss: 0.0807 - val_accuracy: 0.9753 Epoch 11/20 540/540 - 10s - loss: 0.0494 - accuracy: 0.9847 - val_loss: 0.0796 - val_accuracy: 0.9763 Epoch 12/20 540/540 - 12s - loss: 0.0452 - accuracy: 0.9861 - val_loss: 0.0702 - val_accuracy: 0.9798 Epoch 13/20 540/540 - 13s - loss: 0.0402 - accuracy: 0.9882 - val_loss: 0.0667 - val_accuracy: 0.9807 Epoch 14/20 540/540 - 11s - loss: 0.0364 - accuracy: 0.9884 - val_loss: 0.0610 - val_accuracy: 0.9828 Epoch 15/20 540/540 - 10s - loss: 0.0331 - accuracy: 0.9897 - val_loss: 0.0660 - val_accuracy: 0.9798 Epoch 16/20 540/540 - 13s - loss: 0.0332 - accuracy: 0.9894 - val_loss: 0.0562 - val_accuracy: 0.9830 Epoch 17/20 540/540 - 13s - loss: 0.0297 - accuracy: 0.9910 - val_loss: 0.0616 - val_accuracy: 0.9813 Epoch 18/20 540/540 - 13s - loss: 0.0270 - accuracy: 0.9914 - val_loss: 0.0578 - val_accuracy: 0.9843
<tensorflow.python.keras.callbacks.History at 0x1eac5954940>
Testing the Initial Model
test_loss, test_acc = model.evaluate(test_data)
1/1 [==============================] - 2s 2s/step - loss: 0.1099 - accuracy: 0.9702
For the validation data set, we achieved a classification accuracy of 98.4% by the 18th epoch of training. This resulted in a model that produced a testing accuracy of 97%. While these results are good, they were made possible by having pre-existing knowledge that these hyperparameters (e.g., the NN depth of 2 and the hidden layer size of 50) would lead to good results. In the next section, we will try various combinations of hyperparameters to try and find the best combination assuming no prior knowledge.
Even though we achieved 97% accuracy on the original NN model above, we had some insight already from previous studies to choose hyperparameters that would lead to this performance. Now we will optimize assuming we know nothing about the "best" hyperparameters to use.
# HYPERPARAMETERS
# max number of epochs will remain the same
NUM_EPOCHS = 20
# we can test various activation functions here
act_func = ['relu', 'sigmoid', 'tanh']
# we will try to find the best hidden layer size from this list of options (assuming no prior knowledge)
hidden_layer_sizes = []
print("We will consider the following hidden layer sizes (width):")
for n in range (1, 9):
width = 2 ** n
print(width)
hidden_layer_sizes.append(int(width))
We will consider the following hidden layer sizes (width): 2 4 8 16 32 64 128 256
Depth = 2
Perform the training and testing using a NN with a depth of 2 again, but consider all the possible combinations of activation functions and hidden layer size (model "width").
all_width = []
all_func = []
all_test_loss = []
all_test_acc = []
for width in hidden_layer_sizes:
for func in act_func:
# store the parameters
all_width.append(width)
all_func.append(func)
# model definition
model = tf.keras.Sequential([
tf.keras.layers.Flatten(input_shape=input_size), # input layer
tf.keras.layers.Dense(width, activation=func), # 1st hidden layer
tf.keras.layers.Dense(width, activation=func), # 2nd hidden layer
tf.keras.layers.Dense(output_size, activation='softmax') # output layer
])
# optimizer, loss function, and maetrics
model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
# training
model.fit(train_data,
epochs=NUM_EPOCHS,
callbacks=early_stopping,
validation_data=(validation_inputs, validation_targets),
verbose=0)
# testing
print("testing model with hidden layer size of {x} and activation functions of {y}".format(x=width, y=func))
test_loss, test_acc = model.evaluate(test_data)
all_test_loss.append(test_loss)
all_test_acc.append(test_acc)
testing model with hidden layer size of 2 and activation functions of relu 1/1 [==============================] - 2s 2s/step - loss: 2.3010 - accuracy: 0.1135 testing model with hidden layer size of 2 and activation functions of sigmoid 1/1 [==============================] - 2s 2s/step - loss: 1.3422 - accuracy: 0.4265 testing model with hidden layer size of 2 and activation functions of tanh 1/1 [==============================] - 2s 2s/step - loss: 1.2204 - accuracy: 0.6092 testing model with hidden layer size of 4 and activation functions of relu 1/1 [==============================] - 2s 2s/step - loss: 0.4801 - accuracy: 0.8604 testing model with hidden layer size of 4 and activation functions of sigmoid 1/1 [==============================] - 2s 2s/step - loss: 0.7743 - accuracy: 0.7393 testing model with hidden layer size of 4 and activation functions of tanh 1/1 [==============================] - 2s 2s/step - loss: 0.5477 - accuracy: 0.8361 testing model with hidden layer size of 8 and activation functions of relu 1/1 [==============================] - 2s 2s/step - loss: 0.2530 - accuracy: 0.9288 testing model with hidden layer size of 8 and activation functions of sigmoid 1/1 [==============================] - 2s 2s/step - loss: 0.3142 - accuracy: 0.9136 testing model with hidden layer size of 8 and activation functions of tanh 1/1 [==============================] - 2s 2s/step - loss: 0.2671 - accuracy: 0.9215 testing model with hidden layer size of 16 and activation functions of relu 1/1 [==============================] - 2s 2s/step - loss: 0.1690 - accuracy: 0.9529 testing model with hidden layer size of 16 and activation functions of sigmoid 1/1 [==============================] - 2s 2s/step - loss: 0.1951 - accuracy: 0.9431 testing model with hidden layer size of 16 and activation functions of tanh 1/1 [==============================] - 2s 2s/step - loss: 0.1707 - accuracy: 0.9492 testing model with hidden layer size of 32 and activation functions of relu 1/1 [==============================] - 2s 2s/step - loss: 0.1218 - accuracy: 0.9654 testing model with hidden layer size of 32 and activation functions of sigmoid 1/1 [==============================] - 2s 2s/step - loss: 0.1326 - accuracy: 0.9610 testing model with hidden layer size of 32 and activation functions of tanh 1/1 [==============================] - 2s 2s/step - loss: 0.1287 - accuracy: 0.9662 testing model with hidden layer size of 64 and activation functions of relu 1/1 [==============================] - 2s 2s/step - loss: 0.1090 - accuracy: 0.9747 testing model with hidden layer size of 64 and activation functions of sigmoid 1/1 [==============================] - 2s 2s/step - loss: 0.0861 - accuracy: 0.9735 testing model with hidden layer size of 64 and activation functions of tanh 1/1 [==============================] - 2s 2s/step - loss: 0.1133 - accuracy: 0.9712 testing model with hidden layer size of 128 and activation functions of relu 1/1 [==============================] - 2s 2s/step - loss: 0.0893 - accuracy: 0.9780 testing model with hidden layer size of 128 and activation functions of sigmoid 1/1 [==============================] - 2s 2s/step - loss: 0.0782 - accuracy: 0.9776 testing model with hidden layer size of 128 and activation functions of tanh 1/1 [==============================] - 2s 2s/step - loss: 0.0871 - accuracy: 0.9793 testing model with hidden layer size of 256 and activation functions of relu 1/1 [==============================] - 2s 2s/step - loss: 0.0932 - accuracy: 0.9813 testing model with hidden layer size of 256 and activation functions of sigmoid 1/1 [==============================] - 2s 2s/step - loss: 0.0808 - accuracy: 0.9796 testing model with hidden layer size of 256 and activation functions of tanh 1/1 [==============================] - 2s 2s/step - loss: 0.0790 - accuracy: 0.9794
# import additional Python modules to visualize our performance
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
# create a simple DataFrame to view the results
n = len(all_width)
test_id = np.arange(1, n+1, 1)
df = pd.DataFrame(data=test_id, columns=["Test_ID"])
df["Width"] = all_width
df["Activation_Func"] = all_func
df["Test_Loss"] = all_test_loss
df["Test_Accuracy"] = all_test_acc
# sort the results by testing accuracy
df.sort_values(by="Test_Accuracy")
Test_ID | Width | Activation_Func | Test_Loss | Test_Accuracy | |
---|---|---|---|---|---|
0 | 1 | 2 | relu | 2.301045 | 0.1135 |
1 | 2 | 2 | sigmoid | 1.342214 | 0.4265 |
2 | 3 | 2 | tanh | 1.220447 | 0.6092 |
4 | 5 | 4 | sigmoid | 0.774330 | 0.7393 |
5 | 6 | 4 | tanh | 0.547680 | 0.8361 |
3 | 4 | 4 | relu | 0.480087 | 0.8604 |
7 | 8 | 8 | sigmoid | 0.314240 | 0.9136 |
8 | 9 | 8 | tanh | 0.267101 | 0.9215 |
6 | 7 | 8 | relu | 0.252974 | 0.9288 |
10 | 11 | 16 | sigmoid | 0.195123 | 0.9431 |
11 | 12 | 16 | tanh | 0.170701 | 0.9492 |
9 | 10 | 16 | relu | 0.169006 | 0.9529 |
13 | 14 | 32 | sigmoid | 0.132639 | 0.9610 |
12 | 13 | 32 | relu | 0.121809 | 0.9654 |
14 | 15 | 32 | tanh | 0.128679 | 0.9662 |
17 | 18 | 64 | tanh | 0.113350 | 0.9712 |
16 | 17 | 64 | sigmoid | 0.086111 | 0.9735 |
15 | 16 | 64 | relu | 0.108999 | 0.9747 |
19 | 20 | 128 | sigmoid | 0.078223 | 0.9776 |
18 | 19 | 128 | relu | 0.089255 | 0.9780 |
20 | 21 | 128 | tanh | 0.087111 | 0.9793 |
23 | 24 | 256 | tanh | 0.078979 | 0.9794 |
22 | 23 | 256 | sigmoid | 0.080781 | 0.9796 |
21 | 22 | 256 | relu | 0.093192 | 0.9813 |
fig, ax = plt.subplots(figsize=(4, 4), dpi=100)
ax.set(xscale="log")
sns.scatterplot(data=df, x='Width', y='Test_Accuracy', hue='Activation_Func', style='Activation_Func');
With the two hidden layers of size 256 and using "relu" activation functions for each one, we achieve a testing accuracy of 98.1%.
Depth = 4
Increase the depth of the NN to 4 and only consider widths from 4 to 64 this time.
# HYPERPARAMETERS
NUM_EPOCHS = 20
# activation functions
act_func = ['relu', 'sigmoid', 'tanh']
# we will try to find the best hidden layer size from this list of options:
hidden_layer_sizes = []
print("We will consider the following hidden layer sizes (width):")
for n in range (2, 7):
width = 2 ** n
print(width)
hidden_layer_sizes.append(int(width))
We will consider the following hidden layer sizes (width): 4 8 16 32 64
all_width4 = []
all_func4 = []
all_loss4 = []
all_acc4 = []
for width in hidden_layer_sizes:
for func in act_func:
# store the parameters
all_width4.append(width)
all_func4.append(func)
# model definition
model = tf.keras.Sequential([
tf.keras.layers.Flatten(input_shape=input_size), # input layer
tf.keras.layers.Dense(width, activation=func), # 1st hidden layer
tf.keras.layers.Dense(width, activation=func), # 2nd hidden layer
tf.keras.layers.Dense(width, activation=func), # 3rd hidden layer
tf.keras.layers.Dense(width, activation=func), # 4th hidden layer
tf.keras.layers.Dense(output_size, activation='softmax') # output layer
])
# optimizer, loss function, and maetrics
model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
# training
model.fit(train_data,
epochs=NUM_EPOCHS,
callbacks=early_stopping,
validation_data=(validation_inputs, validation_targets),
verbose=2)
# testing
print("testing model with hidden layer size of {x} and activation functions of {y}".format(x=width, y=func))
test_loss, test_acc = model.evaluate(test_data)
all_loss4.append(test_loss)
all_acc4.append(test_acc)
Epoch 1/20 540/540 - 15s - loss: 1.7498 - accuracy: 0.3196 - val_loss: 1.4022 - val_accuracy: 0.4483 Epoch 2/20 540/540 - 10s - loss: 1.2750 - accuracy: 0.5152 - val_loss: 1.2151 - val_accuracy: 0.5435 Epoch 3/20 540/540 - 13s - loss: 1.1357 - accuracy: 0.5938 - val_loss: 1.1054 - val_accuracy: 0.6067 Epoch 4/20 540/540 - 11s - loss: 1.0384 - accuracy: 0.6376 - val_loss: 1.0270 - val_accuracy: 0.6410 Epoch 5/20 540/540 - 12s - loss: 0.9702 - accuracy: 0.6679 - val_loss: 0.9694 - val_accuracy: 0.6703 Epoch 6/20 540/540 - 12s - loss: 0.9240 - accuracy: 0.6904 - val_loss: 0.9294 - val_accuracy: 0.6902 Epoch 7/20 540/540 - 13s - loss: 0.8891 - accuracy: 0.7146 - val_loss: 0.8917 - val_accuracy: 0.7153 Epoch 8/20 540/540 - 13s - loss: 0.8548 - accuracy: 0.7345 - val_loss: 0.8600 - val_accuracy: 0.7320 Epoch 9/20 540/540 - 11s - loss: 0.8256 - accuracy: 0.7474 - val_loss: 0.8318 - val_accuracy: 0.7430 Epoch 10/20 540/540 - 9s - loss: 0.8017 - accuracy: 0.7569 - val_loss: 0.8188 - val_accuracy: 0.7483 Epoch 11/20 540/540 - 13s - loss: 0.7801 - accuracy: 0.7656 - val_loss: 0.7970 - val_accuracy: 0.7610 Epoch 12/20 540/540 - 13s - loss: 0.7614 - accuracy: 0.7707 - val_loss: 0.7861 - val_accuracy: 0.7675 Epoch 13/20 540/540 - 12s - loss: 0.7489 - accuracy: 0.7765 - val_loss: 0.7727 - val_accuracy: 0.7717 Epoch 14/20 540/540 - 11s - loss: 0.7375 - accuracy: 0.7799 - val_loss: 0.7604 - val_accuracy: 0.7725 Epoch 15/20 540/540 - 13s - loss: 0.7284 - accuracy: 0.7830 - val_loss: 0.7539 - val_accuracy: 0.7770 Epoch 16/20 540/540 - 13s - loss: 0.7160 - accuracy: 0.7863 - val_loss: 0.7489 - val_accuracy: 0.7785 Epoch 17/20 540/540 - 13s - loss: 0.7102 - accuracy: 0.7894 - val_loss: 0.7409 - val_accuracy: 0.7777 Epoch 18/20 540/540 - 14s - loss: 0.7012 - accuracy: 0.7922 - val_loss: 0.7463 - val_accuracy: 0.7747 Epoch 19/20 540/540 - 12s - loss: 0.6973 - accuracy: 0.7932 - val_loss: 0.7485 - val_accuracy: 0.7815 testing model with hidden layer size of 4 and activation functions of relu 1/1 [==============================] - 1s 1s/step - loss: 0.7318 - accuracy: 0.7895 Epoch 1/20 540/540 - 14s - loss: 2.2998 - accuracy: 0.1149 - val_loss: 2.2641 - val_accuracy: 0.1883 Epoch 2/20 540/540 - 14s - loss: 2.1566 - accuracy: 0.2063 - val_loss: 2.0195 - val_accuracy: 0.2072 Epoch 3/20 540/540 - 14s - loss: 1.9117 - accuracy: 0.2326 - val_loss: 1.8335 - val_accuracy: 0.2655 Epoch 4/20 540/540 - 12s - loss: 1.7742 - accuracy: 0.3009 - val_loss: 1.7247 - val_accuracy: 0.3505 Epoch 5/20 540/540 - 11s - loss: 1.6832 - accuracy: 0.3719 - val_loss: 1.6455 - val_accuracy: 0.3813 Epoch 6/20 540/540 - 12s - loss: 1.6080 - accuracy: 0.3948 - val_loss: 1.5778 - val_accuracy: 0.4078 Epoch 7/20 540/540 - 11s - loss: 1.5469 - accuracy: 0.4359 - val_loss: 1.5209 - val_accuracy: 0.4568 Epoch 8/20 540/540 - 11s - loss: 1.4913 - accuracy: 0.4474 - val_loss: 1.4698 - val_accuracy: 0.4417 Epoch 9/20 540/540 - 14s - loss: 1.4432 - accuracy: 0.4479 - val_loss: 1.4212 - val_accuracy: 0.4505 Epoch 10/20 540/540 - 14s - loss: 1.3962 - accuracy: 0.4531 - val_loss: 1.3750 - val_accuracy: 0.4632 Epoch 11/20 540/540 - 12s - loss: 1.3466 - accuracy: 0.4589 - val_loss: 1.3184 - val_accuracy: 0.4657 Epoch 12/20 540/540 - 13s - loss: 1.2958 - accuracy: 0.4645 - val_loss: 1.2744 - val_accuracy: 0.4673 Epoch 13/20 540/540 - 12s - loss: 1.2563 - accuracy: 0.4655 - val_loss: 1.2461 - val_accuracy: 0.4710 Epoch 14/20 540/540 - 13s - loss: 1.2318 - accuracy: 0.4666 - val_loss: 1.2207 - val_accuracy: 0.4713 Epoch 15/20 540/540 - 10s - loss: 1.2103 - accuracy: 0.4696 - val_loss: 1.2027 - val_accuracy: 0.4738 Epoch 16/20 540/540 - 12s - loss: 1.1947 - accuracy: 0.4729 - val_loss: 1.1901 - val_accuracy: 0.4783 Epoch 17/20 540/540 - 13s - loss: 1.1795 - accuracy: 0.4753 - val_loss: 1.1772 - val_accuracy: 0.4887 Epoch 18/20 540/540 - 10s - loss: 1.1691 - accuracy: 0.4938 - val_loss: 1.1677 - val_accuracy: 0.4908 Epoch 19/20 540/540 - 10s - loss: 1.1565 - accuracy: 0.5345 - val_loss: 1.1572 - val_accuracy: 0.5552 Epoch 20/20 540/540 - 9s - loss: 1.1403 - accuracy: 0.5558 - val_loss: 1.1369 - val_accuracy: 0.5683 testing model with hidden layer size of 4 and activation functions of sigmoid 1/1 [==============================] - 2s 2s/step - loss: 1.1487 - accuracy: 0.5601 Epoch 1/20 540/540 - 16s - loss: 1.6240 - accuracy: 0.4808 - val_loss: 1.1810 - val_accuracy: 0.5708 Epoch 2/20 540/540 - 13s - loss: 1.0491 - accuracy: 0.5845 - val_loss: 0.9624 - val_accuracy: 0.6140 Epoch 3/20 540/540 - 13s - loss: 0.9275 - accuracy: 0.6291 - val_loss: 0.8981 - val_accuracy: 0.6405 Epoch 4/20 540/540 - 13s - loss: 0.8718 - accuracy: 0.6516 - val_loss: 0.8644 - val_accuracy: 0.6598 Epoch 5/20 540/540 - 13s - loss: 0.8438 - accuracy: 0.6673 - val_loss: 0.8454 - val_accuracy: 0.6738 Epoch 6/20 540/540 - 10s - loss: 0.8198 - accuracy: 0.6977 - val_loss: 0.8160 - val_accuracy: 0.7205 Epoch 7/20 540/540 - 14s - loss: 0.7722 - accuracy: 0.7418 - val_loss: 0.7596 - val_accuracy: 0.7475 Epoch 8/20 540/540 - 12s - loss: 0.7293 - accuracy: 0.7585 - val_loss: 0.7275 - val_accuracy: 0.7578 Epoch 9/20 540/540 - 12s - loss: 0.6995 - accuracy: 0.7655 - val_loss: 0.7080 - val_accuracy: 0.7652 Epoch 10/20 540/540 - 13s - loss: 0.6911 - accuracy: 0.7665 - val_loss: 0.6951 - val_accuracy: 0.7673 Epoch 11/20 540/540 - 13s - loss: 0.6774 - accuracy: 0.7707 - val_loss: 0.6846 - val_accuracy: 0.7720 Epoch 12/20 540/540 - 13s - loss: 0.6708 - accuracy: 0.7732 - val_loss: 0.6849 - val_accuracy: 0.7652 Epoch 13/20 540/540 - 11s - loss: 0.6631 - accuracy: 0.7750 - val_loss: 0.6792 - val_accuracy: 0.7738 Epoch 14/20 540/540 - 7s - loss: 0.6581 - accuracy: 0.7771 - val_loss: 0.6776 - val_accuracy: 0.7703 Epoch 15/20 540/540 - 12s - loss: 0.6539 - accuracy: 0.7782 - val_loss: 0.6711 - val_accuracy: 0.7748 Epoch 16/20 540/540 - 8s - loss: 0.6492 - accuracy: 0.7798 - val_loss: 0.6668 - val_accuracy: 0.7768 Epoch 17/20 540/540 - 7s - loss: 0.6452 - accuracy: 0.7811 - val_loss: 0.6575 - val_accuracy: 0.7798 Epoch 18/20 540/540 - 9s - loss: 0.6429 - accuracy: 0.7814 - val_loss: 0.6579 - val_accuracy: 0.7785 Epoch 19/20 540/540 - 12s - loss: 0.6392 - accuracy: 0.7824 - val_loss: 0.6563 - val_accuracy: 0.7775 Epoch 20/20 540/540 - 13s - loss: 0.6354 - accuracy: 0.7849 - val_loss: 0.6538 - val_accuracy: 0.7803 testing model with hidden layer size of 4 and activation functions of tanh 1/1 [==============================] - 2s 2s/step - loss: 0.6710 - accuracy: 0.7723 Epoch 1/20 540/540 - 15s - loss: 1.2069 - accuracy: 0.5826 - val_loss: 0.7179 - val_accuracy: 0.7640 Epoch 2/20 540/540 - 12s - loss: 0.6150 - accuracy: 0.8100 - val_loss: 0.5622 - val_accuracy: 0.8278 Epoch 3/20 540/540 - 12s - loss: 0.5116 - accuracy: 0.8484 - val_loss: 0.5002 - val_accuracy: 0.8525 Epoch 4/20 540/540 - 11s - loss: 0.4559 - accuracy: 0.8664 - val_loss: 0.4591 - val_accuracy: 0.8617 Epoch 5/20 540/540 - 9s - loss: 0.4259 - accuracy: 0.8761 - val_loss: 0.4310 - val_accuracy: 0.8747 Epoch 6/20 540/540 - 11s - loss: 0.3995 - accuracy: 0.8849 - val_loss: 0.4074 - val_accuracy: 0.8833 Epoch 7/20 540/540 - 13s - loss: 0.3755 - accuracy: 0.8929 - val_loss: 0.3874 - val_accuracy: 0.8872 Epoch 8/20 540/540 - 9s - loss: 0.3459 - accuracy: 0.9011 - val_loss: 0.3574 - val_accuracy: 0.8983 Epoch 9/20 540/540 - 12s - loss: 0.3296 - accuracy: 0.9062 - val_loss: 0.3509 - val_accuracy: 0.8988 Epoch 10/20 540/540 - 11s - loss: 0.3174 - accuracy: 0.9096 - val_loss: 0.3341 - val_accuracy: 0.9023 Epoch 11/20 540/540 - 8s - loss: 0.3104 - accuracy: 0.9109 - val_loss: 0.3286 - val_accuracy: 0.9062 Epoch 12/20 540/540 - 14s - loss: 0.3050 - accuracy: 0.9132 - val_loss: 0.3171 - val_accuracy: 0.9100 Epoch 13/20 540/540 - 16s - loss: 0.2983 - accuracy: 0.9143 - val_loss: 0.3209 - val_accuracy: 0.9097 Epoch 14/20 540/540 - 16s - loss: 0.2895 - accuracy: 0.9170 - val_loss: 0.3159 - val_accuracy: 0.9082 Epoch 15/20 540/540 - 13s - loss: 0.2823 - accuracy: 0.9185 - val_loss: 0.3052 - val_accuracy: 0.9138 Epoch 16/20 540/540 - 14s - loss: 0.2777 - accuracy: 0.9202 - val_loss: 0.3057 - val_accuracy: 0.9113 Epoch 17/20 540/540 - 11s - loss: 0.2727 - accuracy: 0.9208 - val_loss: 0.2943 - val_accuracy: 0.9155 Epoch 18/20 540/540 - 11s - loss: 0.2680 - accuracy: 0.9226 - val_loss: 0.2953 - val_accuracy: 0.9140 Epoch 19/20 540/540 - 12s - loss: 0.2646 - accuracy: 0.9235 - val_loss: 0.2850 - val_accuracy: 0.9182 Epoch 20/20 540/540 - 12s - loss: 0.2598 - accuracy: 0.9247 - val_loss: 0.2895 - val_accuracy: 0.9165 testing model with hidden layer size of 8 and activation functions of relu 1/1 [==============================] - 2s 2s/step - loss: 0.3027 - accuracy: 0.9139 Epoch 1/20 540/540 - 16s - loss: 2.2398 - accuracy: 0.1958 - val_loss: 2.0339 - val_accuracy: 0.2363 Epoch 2/20 540/540 - 14s - loss: 1.7935 - accuracy: 0.3834 - val_loss: 1.5661 - val_accuracy: 0.4685 Epoch 3/20 540/540 - 13s - loss: 1.4009 - accuracy: 0.5305 - val_loss: 1.2845 - val_accuracy: 0.6068 Epoch 4/20 540/540 - 11s - loss: 1.1973 - accuracy: 0.6524 - val_loss: 1.1331 - val_accuracy: 0.6668 Epoch 5/20 540/540 - 13s - loss: 1.0728 - accuracy: 0.6910 - val_loss: 1.0446 - val_accuracy: 0.6958 Epoch 6/20 540/540 - 16s - loss: 0.9823 - accuracy: 0.7184 - val_loss: 0.9531 - val_accuracy: 0.7282 Epoch 7/20 540/540 - 12s - loss: 0.9073 - accuracy: 0.7463 - val_loss: 0.8849 - val_accuracy: 0.7597 Epoch 8/20 540/540 - 13s - loss: 0.8442 - accuracy: 0.7716 - val_loss: 0.8294 - val_accuracy: 0.7747 Epoch 9/20 540/540 - 13s - loss: 0.7941 - accuracy: 0.7857 - val_loss: 0.7863 - val_accuracy: 0.7837 Epoch 10/20 540/540 - 16s - loss: 0.7506 - accuracy: 0.7984 - val_loss: 0.7480 - val_accuracy: 0.7965 Epoch 11/20 540/540 - 10s - loss: 0.7107 - accuracy: 0.8168 - val_loss: 0.7104 - val_accuracy: 0.8158 Epoch 12/20 540/540 - 10s - loss: 0.6767 - accuracy: 0.8310 - val_loss: 0.6804 - val_accuracy: 0.8343 Epoch 13/20 540/540 - 8s - loss: 0.6429 - accuracy: 0.8441 - val_loss: 0.6527 - val_accuracy: 0.8383 Epoch 14/20 540/540 - 8s - loss: 0.6138 - accuracy: 0.8506 - val_loss: 0.6248 - val_accuracy: 0.8447 Epoch 15/20 540/540 - 10s - loss: 0.5893 - accuracy: 0.8562 - val_loss: 0.6080 - val_accuracy: 0.8467 Epoch 16/20 540/540 - 12s - loss: 0.5677 - accuracy: 0.8607 - val_loss: 0.5865 - val_accuracy: 0.8497 Epoch 17/20 540/540 - 13s - loss: 0.5482 - accuracy: 0.8640 - val_loss: 0.5699 - val_accuracy: 0.8528 Epoch 18/20 540/540 - 13s - loss: 0.5311 - accuracy: 0.8683 - val_loss: 0.5548 - val_accuracy: 0.8570 Epoch 19/20 540/540 - 8s - loss: 0.5158 - accuracy: 0.8713 - val_loss: 0.5420 - val_accuracy: 0.8600 Epoch 20/20 540/540 - 5s - loss: 0.5042 - accuracy: 0.8741 - val_loss: 0.5314 - val_accuracy: 0.8623 testing model with hidden layer size of 8 and activation functions of sigmoid 1/1 [==============================] - 1s 1s/step - loss: 0.5414 - accuracy: 0.8640 Epoch 1/20 540/540 - 7s - loss: 1.1525 - accuracy: 0.6917 - val_loss: 0.6856 - val_accuracy: 0.8287 Epoch 2/20 540/540 - 7s - loss: 0.5628 - accuracy: 0.8581 - val_loss: 0.4843 - val_accuracy: 0.8735 Epoch 3/20 540/540 - 11s - loss: 0.4417 - accuracy: 0.8862 - val_loss: 0.4132 - val_accuracy: 0.8913 Epoch 4/20 540/540 - 13s - loss: 0.3858 - accuracy: 0.8996 - val_loss: 0.3870 - val_accuracy: 0.9000 Epoch 5/20 540/540 - 11s - loss: 0.3614 - accuracy: 0.9045 - val_loss: 0.3657 - val_accuracy: 0.9015 Epoch 6/20 540/540 - 12s - loss: 0.3437 - accuracy: 0.9094 - val_loss: 0.3490 - val_accuracy: 0.9062 Epoch 7/20 540/540 - 12s - loss: 0.3261 - accuracy: 0.9126 - val_loss: 0.3451 - val_accuracy: 0.9035 Epoch 8/20 540/540 - 14s - loss: 0.3129 - accuracy: 0.9154 - val_loss: 0.3249 - val_accuracy: 0.9117 Epoch 9/20 540/540 - 13s - loss: 0.2991 - accuracy: 0.9186 - val_loss: 0.3094 - val_accuracy: 0.9167 Epoch 10/20 540/540 - 13s - loss: 0.2862 - accuracy: 0.9217 - val_loss: 0.2991 - val_accuracy: 0.9192 Epoch 11/20 540/540 - 12s - loss: 0.2749 - accuracy: 0.9244 - val_loss: 0.2962 - val_accuracy: 0.9183 Epoch 12/20 540/540 - 13s - loss: 0.2659 - accuracy: 0.9269 - val_loss: 0.2840 - val_accuracy: 0.9227 Epoch 13/20 540/540 - 9s - loss: 0.2571 - accuracy: 0.9287 - val_loss: 0.2763 - val_accuracy: 0.9248 Epoch 14/20 540/540 - 13s - loss: 0.2470 - accuracy: 0.9313 - val_loss: 0.2700 - val_accuracy: 0.9273 Epoch 15/20 540/540 - 14s - loss: 0.2450 - accuracy: 0.9319 - val_loss: 0.2643 - val_accuracy: 0.9273 Epoch 16/20 540/540 - 9s - loss: 0.2375 - accuracy: 0.9334 - val_loss: 0.2781 - val_accuracy: 0.9205 Epoch 17/20 540/540 - 9s - loss: 0.2295 - accuracy: 0.9355 - val_loss: 0.2609 - val_accuracy: 0.9275 Epoch 18/20 540/540 - 10s - loss: 0.2255 - accuracy: 0.9370 - val_loss: 0.2486 - val_accuracy: 0.9308 Epoch 19/20 540/540 - 11s - loss: 0.2229 - accuracy: 0.9372 - val_loss: 0.2504 - val_accuracy: 0.9277 Epoch 20/20 540/540 - 10s - loss: 0.2173 - accuracy: 0.9392 - val_loss: 0.2400 - val_accuracy: 0.9300 testing model with hidden layer size of 8 and activation functions of tanh 1/1 [==============================] - 1s 837ms/step - loss: 0.2613 - accuracy: 0.9277 Epoch 1/20 540/540 - 11s - loss: 0.7950 - accuracy: 0.7343 - val_loss: 0.3733 - val_accuracy: 0.8945 Epoch 2/20 540/540 - 7s - loss: 0.2989 - accuracy: 0.9141 - val_loss: 0.2880 - val_accuracy: 0.9153 Epoch 3/20 540/540 - 8s - loss: 0.2461 - accuracy: 0.9299 - val_loss: 0.2348 - val_accuracy: 0.9320 Epoch 4/20 540/540 - 12s - loss: 0.2203 - accuracy: 0.9366 - val_loss: 0.2138 - val_accuracy: 0.9362 Epoch 5/20 540/540 - 11s - loss: 0.2031 - accuracy: 0.9410 - val_loss: 0.2023 - val_accuracy: 0.9405 Epoch 6/20 540/540 - 9s - loss: 0.1845 - accuracy: 0.9452 - val_loss: 0.1874 - val_accuracy: 0.9442 Epoch 7/20 540/540 - 13s - loss: 0.1773 - accuracy: 0.9480 - val_loss: 0.1789 - val_accuracy: 0.9473 Epoch 8/20 540/540 - 13s - loss: 0.1654 - accuracy: 0.9519 - val_loss: 0.1709 - val_accuracy: 0.9500 Epoch 9/20 540/540 - 7s - loss: 0.1598 - accuracy: 0.9528 - val_loss: 0.1764 - val_accuracy: 0.9482 Epoch 10/20 540/540 - 9s - loss: 0.1523 - accuracy: 0.9555 - val_loss: 0.1680 - val_accuracy: 0.9483 Epoch 11/20 540/540 - 14s - loss: 0.1501 - accuracy: 0.9558 - val_loss: 0.1700 - val_accuracy: 0.9480 Epoch 12/20 540/540 - 12s - loss: 0.1407 - accuracy: 0.9581 - val_loss: 0.1594 - val_accuracy: 0.9493 Epoch 13/20 540/540 - 13s - loss: 0.1373 - accuracy: 0.9594 - val_loss: 0.1602 - val_accuracy: 0.9512 Epoch 14/20 540/540 - 13s - loss: 0.1317 - accuracy: 0.9608 - val_loss: 0.1543 - val_accuracy: 0.9537 Epoch 15/20 540/540 - 14s - loss: 0.1288 - accuracy: 0.9616 - val_loss: 0.1510 - val_accuracy: 0.9538 Epoch 16/20 540/540 - 11s - loss: 0.1248 - accuracy: 0.9624 - val_loss: 0.1445 - val_accuracy: 0.9552 Epoch 17/20 540/540 - 13s - loss: 0.1234 - accuracy: 0.9630 - val_loss: 0.1427 - val_accuracy: 0.9560 Epoch 18/20 540/540 - 10s - loss: 0.1194 - accuracy: 0.9642 - val_loss: 0.1499 - val_accuracy: 0.9540 Epoch 19/20 540/540 - 14s - loss: 0.1158 - accuracy: 0.9649 - val_loss: 0.1466 - val_accuracy: 0.9535 testing model with hidden layer size of 16 and activation functions of relu 1/1 [==============================] - 2s 2s/step - loss: 0.1787 - accuracy: 0.9491 Epoch 1/20 540/540 - 17s - loss: 2.0998 - accuracy: 0.2718 - val_loss: 1.6276 - val_accuracy: 0.4490 Epoch 2/20 540/540 - 13s - loss: 1.3102 - accuracy: 0.5786 - val_loss: 1.0567 - val_accuracy: 0.7257 Epoch 3/20 540/540 - 14s - loss: 0.8604 - accuracy: 0.8038 - val_loss: 0.7002 - val_accuracy: 0.8437 Epoch 4/20 540/540 - 14s - loss: 0.6010 - accuracy: 0.8701 - val_loss: 0.5284 - val_accuracy: 0.8838 Epoch 5/20 540/540 - 14s - loss: 0.4730 - accuracy: 0.8926 - val_loss: 0.4386 - val_accuracy: 0.8953 Epoch 6/20 540/540 - 13s - loss: 0.4006 - accuracy: 0.9033 - val_loss: 0.3806 - val_accuracy: 0.9052 Epoch 7/20 540/540 - 13s - loss: 0.3518 - accuracy: 0.9130 - val_loss: 0.3422 - val_accuracy: 0.9142 Epoch 8/20 540/540 - 13s - loss: 0.3185 - accuracy: 0.9190 - val_loss: 0.3160 - val_accuracy: 0.9172 Epoch 9/20 540/540 - 14s - loss: 0.2945 - accuracy: 0.9249 - val_loss: 0.2982 - val_accuracy: 0.9212 Epoch 10/20 540/540 - 14s - loss: 0.2759 - accuracy: 0.9289 - val_loss: 0.2805 - val_accuracy: 0.9267 Epoch 11/20 540/540 - 14s - loss: 0.2620 - accuracy: 0.9322 - val_loss: 0.2778 - val_accuracy: 0.9233 Epoch 12/20 540/540 - 13s - loss: 0.2503 - accuracy: 0.9347 - val_loss: 0.2596 - val_accuracy: 0.9300 Epoch 13/20 540/540 - 5s - loss: 0.2363 - accuracy: 0.9381 - val_loss: 0.2564 - val_accuracy: 0.9305 Epoch 14/20 540/540 - 9s - loss: 0.2284 - accuracy: 0.9404 - val_loss: 0.2452 - val_accuracy: 0.9338 Epoch 15/20 540/540 - 12s - loss: 0.2210 - accuracy: 0.9414 - val_loss: 0.2439 - val_accuracy: 0.9337 Epoch 16/20 540/540 - 12s - loss: 0.2128 - accuracy: 0.9434 - val_loss: 0.2350 - val_accuracy: 0.9377 Epoch 17/20 540/540 - 11s - loss: 0.2063 - accuracy: 0.9453 - val_loss: 0.2347 - val_accuracy: 0.9340 Epoch 18/20 540/540 - 12s - loss: 0.2014 - accuracy: 0.9465 - val_loss: 0.2316 - val_accuracy: 0.9355 Epoch 19/20 540/540 - 12s - loss: 0.1954 - accuracy: 0.9479 - val_loss: 0.2258 - val_accuracy: 0.9387 Epoch 20/20 540/540 - 11s - loss: 0.1897 - accuracy: 0.9491 - val_loss: 0.2223 - val_accuracy: 0.9402 testing model with hidden layer size of 16 and activation functions of sigmoid 1/1 [==============================] - 2s 2s/step - loss: 0.2357 - accuracy: 0.9399 Epoch 1/20 540/540 - 13s - loss: 0.7580 - accuracy: 0.8102 - val_loss: 0.3592 - val_accuracy: 0.9045 Epoch 2/20 540/540 - 10s - loss: 0.2932 - accuracy: 0.9203 - val_loss: 0.2618 - val_accuracy: 0.9228 Epoch 3/20 540/540 - 10s - loss: 0.2276 - accuracy: 0.9356 - val_loss: 0.2239 - val_accuracy: 0.9340 Epoch 4/20 540/540 - 11s - loss: 0.1975 - accuracy: 0.9420 - val_loss: 0.1999 - val_accuracy: 0.9408 Epoch 5/20 540/540 - 15s - loss: 0.1786 - accuracy: 0.9480 - val_loss: 0.1873 - val_accuracy: 0.9445 Epoch 6/20 540/540 - 9s - loss: 0.1641 - accuracy: 0.9517 - val_loss: 0.1743 - val_accuracy: 0.9475 Epoch 7/20 540/540 - 8s - loss: 0.1538 - accuracy: 0.9544 - val_loss: 0.1685 - val_accuracy: 0.9492 Epoch 8/20 540/540 - 11s - loss: 0.1440 - accuracy: 0.9575 - val_loss: 0.1583 - val_accuracy: 0.9533 Epoch 9/20 540/540 - 13s - loss: 0.1376 - accuracy: 0.9592 - val_loss: 0.1565 - val_accuracy: 0.9527 Epoch 10/20 540/540 - 10s - loss: 0.1316 - accuracy: 0.9605 - val_loss: 0.1628 - val_accuracy: 0.9513 Epoch 11/20 540/540 - 5s - loss: 0.1257 - accuracy: 0.9629 - val_loss: 0.1561 - val_accuracy: 0.9517 Epoch 12/20 540/540 - 5s - loss: 0.1203 - accuracy: 0.9647 - val_loss: 0.1451 - val_accuracy: 0.9567 Epoch 13/20 540/540 - 12s - loss: 0.1168 - accuracy: 0.9653 - val_loss: 0.1447 - val_accuracy: 0.9575 Epoch 14/20 540/540 - 15s - loss: 0.1157 - accuracy: 0.9658 - val_loss: 0.1432 - val_accuracy: 0.9547 Epoch 15/20 540/540 - 15s - loss: 0.1111 - accuracy: 0.9668 - val_loss: 0.1464 - val_accuracy: 0.9552 Epoch 16/20 540/540 - 16s - loss: 0.1064 - accuracy: 0.9681 - val_loss: 0.1487 - val_accuracy: 0.9553 testing model with hidden layer size of 16 and activation functions of tanh 1/1 [==============================] - 2s 2s/step - loss: 0.1741 - accuracy: 0.9509 Epoch 1/20 540/540 - 18s - loss: 0.5064 - accuracy: 0.8441 - val_loss: 0.2508 - val_accuracy: 0.9242 Epoch 2/20 540/540 - 15s - loss: 0.2000 - accuracy: 0.9409 - val_loss: 0.1866 - val_accuracy: 0.9460 Epoch 3/20 540/540 - 11s - loss: 0.1546 - accuracy: 0.9538 - val_loss: 0.1649 - val_accuracy: 0.9498 Epoch 4/20 540/540 - 9s - loss: 0.1297 - accuracy: 0.9614 - val_loss: 0.1346 - val_accuracy: 0.9597 Epoch 5/20 540/540 - 10s - loss: 0.1156 - accuracy: 0.9650 - val_loss: 0.1310 - val_accuracy: 0.9620 Epoch 6/20 540/540 - 12s - loss: 0.1035 - accuracy: 0.9689 - val_loss: 0.1227 - val_accuracy: 0.9633 Epoch 7/20 540/540 - 15s - loss: 0.0939 - accuracy: 0.9717 - val_loss: 0.1082 - val_accuracy: 0.9657 Epoch 8/20 540/540 - 15s - loss: 0.0882 - accuracy: 0.9732 - val_loss: 0.1096 - val_accuracy: 0.9657 Epoch 9/20 540/540 - 17s - loss: 0.0780 - accuracy: 0.9761 - val_loss: 0.1007 - val_accuracy: 0.9698 Epoch 10/20 540/540 - 17s - loss: 0.0745 - accuracy: 0.9769 - val_loss: 0.0959 - val_accuracy: 0.9725 Epoch 11/20 540/540 - 16s - loss: 0.0699 - accuracy: 0.9778 - val_loss: 0.0925 - val_accuracy: 0.9712 Epoch 12/20 540/540 - 14s - loss: 0.0653 - accuracy: 0.9799 - val_loss: 0.0917 - val_accuracy: 0.9723 Epoch 13/20 540/540 - 13s - loss: 0.0634 - accuracy: 0.9804 - val_loss: 0.0810 - val_accuracy: 0.9747 Epoch 14/20 540/540 - 12s - loss: 0.0581 - accuracy: 0.9816 - val_loss: 0.0886 - val_accuracy: 0.9713 Epoch 15/20 540/540 - 16s - loss: 0.0580 - accuracy: 0.9818 - val_loss: 0.0733 - val_accuracy: 0.9755 Epoch 16/20 540/540 - 17s - loss: 0.0519 - accuracy: 0.9836 - val_loss: 0.0846 - val_accuracy: 0.9733 Epoch 17/20 540/540 - 16s - loss: 0.0492 - accuracy: 0.9847 - val_loss: 0.0877 - val_accuracy: 0.9735 testing model with hidden layer size of 32 and activation functions of relu 1/1 [==============================] - 2s 2s/step - loss: 0.1390 - accuracy: 0.9631 Epoch 1/20 540/540 - 16s - loss: 1.6218 - accuracy: 0.5085 - val_loss: 0.8712 - val_accuracy: 0.7985 Epoch 2/20 540/540 - 16s - loss: 0.6362 - accuracy: 0.8542 - val_loss: 0.5003 - val_accuracy: 0.8820 Epoch 3/20 540/540 - 17s - loss: 0.4159 - accuracy: 0.9000 - val_loss: 0.3769 - val_accuracy: 0.9030 Epoch 4/20 540/540 - 17s - loss: 0.3283 - accuracy: 0.9178 - val_loss: 0.3194 - val_accuracy: 0.9153 Epoch 5/20 540/540 - 17s - loss: 0.2800 - accuracy: 0.9276 - val_loss: 0.2832 - val_accuracy: 0.9250 Epoch 6/20 540/540 - 16s - loss: 0.2468 - accuracy: 0.9361 - val_loss: 0.2518 - val_accuracy: 0.9328 Epoch 7/20 540/540 - 16s - loss: 0.2209 - accuracy: 0.9421 - val_loss: 0.2238 - val_accuracy: 0.9408 Epoch 8/20 540/540 - 16s - loss: 0.1989 - accuracy: 0.9479 - val_loss: 0.2111 - val_accuracy: 0.9417 Epoch 9/20 540/540 - 12s - loss: 0.1828 - accuracy: 0.9519 - val_loss: 0.1948 - val_accuracy: 0.9468 Epoch 10/20 540/540 - 9s - loss: 0.1695 - accuracy: 0.9556 - val_loss: 0.1855 - val_accuracy: 0.9488 Epoch 11/20 540/540 - 15s - loss: 0.1561 - accuracy: 0.9586 - val_loss: 0.1766 - val_accuracy: 0.9497 Epoch 12/20 540/540 - 15s - loss: 0.1451 - accuracy: 0.9616 - val_loss: 0.1700 - val_accuracy: 0.9518 Epoch 13/20 540/540 - 11s - loss: 0.1383 - accuracy: 0.9632 - val_loss: 0.1607 - val_accuracy: 0.9532 Epoch 14/20 540/540 - 11s - loss: 0.1287 - accuracy: 0.9655 - val_loss: 0.1537 - val_accuracy: 0.9567 Epoch 15/20 540/540 - 8s - loss: 0.1228 - accuracy: 0.9671 - val_loss: 0.1479 - val_accuracy: 0.9575 Epoch 16/20 540/540 - 12s - loss: 0.1148 - accuracy: 0.9695 - val_loss: 0.1458 - val_accuracy: 0.9597 Epoch 17/20 540/540 - 14s - loss: 0.1101 - accuracy: 0.9705 - val_loss: 0.1370 - val_accuracy: 0.9620 Epoch 18/20 540/540 - 13s - loss: 0.1045 - accuracy: 0.9722 - val_loss: 0.1363 - val_accuracy: 0.9605 Epoch 19/20 540/540 - 14s - loss: 0.1010 - accuracy: 0.9726 - val_loss: 0.1347 - val_accuracy: 0.9597 Epoch 20/20 540/540 - 9s - loss: 0.0977 - accuracy: 0.9743 - val_loss: 0.1299 - val_accuracy: 0.9610 testing model with hidden layer size of 32 and activation functions of sigmoid 1/1 [==============================] - 2s 2s/step - loss: 0.1760 - accuracy: 0.9526 Epoch 1/20 540/540 - 15s - loss: 0.4760 - accuracy: 0.8691 - val_loss: 0.2363 - val_accuracy: 0.9307 Epoch 2/20 540/540 - 15s - loss: 0.1921 - accuracy: 0.9435 - val_loss: 0.1724 - val_accuracy: 0.9487 Epoch 3/20 540/540 - 14s - loss: 0.1439 - accuracy: 0.9575 - val_loss: 0.1447 - val_accuracy: 0.9555 Epoch 4/20 540/540 - 8s - loss: 0.1186 - accuracy: 0.9648 - val_loss: 0.1252 - val_accuracy: 0.9653 Epoch 5/20 540/540 - 13s - loss: 0.1011 - accuracy: 0.9692 - val_loss: 0.1134 - val_accuracy: 0.9680 Epoch 6/20 540/540 - 13s - loss: 0.0899 - accuracy: 0.9731 - val_loss: 0.1102 - val_accuracy: 0.9693 Epoch 7/20 540/540 - 15s - loss: 0.0775 - accuracy: 0.9770 - val_loss: 0.1039 - val_accuracy: 0.9697 Epoch 8/20 540/540 - 11s - loss: 0.0738 - accuracy: 0.9772 - val_loss: 0.1024 - val_accuracy: 0.9693 Epoch 9/20 540/540 - 9s - loss: 0.0651 - accuracy: 0.9800 - val_loss: 0.1042 - val_accuracy: 0.9672 Epoch 10/20 540/540 - 9s - loss: 0.0597 - accuracy: 0.9814 - val_loss: 0.0920 - val_accuracy: 0.9722 Epoch 11/20 540/540 - 8s - loss: 0.0558 - accuracy: 0.9833 - val_loss: 0.0906 - val_accuracy: 0.9730 Epoch 12/20 540/540 - 9s - loss: 0.0520 - accuracy: 0.9842 - val_loss: 0.0836 - val_accuracy: 0.9750 Epoch 13/20 540/540 - 10s - loss: 0.0481 - accuracy: 0.9850 - val_loss: 0.0862 - val_accuracy: 0.9743 Epoch 14/20 540/540 - 13s - loss: 0.0446 - accuracy: 0.9862 - val_loss: 0.0807 - val_accuracy: 0.9768 Epoch 15/20 540/540 - 14s - loss: 0.0428 - accuracy: 0.9863 - val_loss: 0.0716 - val_accuracy: 0.9805 Epoch 16/20 540/540 - 12s - loss: 0.0381 - accuracy: 0.9885 - val_loss: 0.0874 - val_accuracy: 0.9728 Epoch 17/20 540/540 - 12s - loss: 0.0354 - accuracy: 0.9894 - val_loss: 0.0740 - val_accuracy: 0.9793 testing model with hidden layer size of 32 and activation functions of tanh 1/1 [==============================] - 1s 1s/step - loss: 0.1314 - accuracy: 0.9654 Epoch 1/20 540/540 - 14s - loss: 0.3661 - accuracy: 0.8896 - val_loss: 0.1897 - val_accuracy: 0.9447 Epoch 2/20 540/540 - 14s - loss: 0.1514 - accuracy: 0.9542 - val_loss: 0.1273 - val_accuracy: 0.9625 Epoch 3/20 540/540 - 10s - loss: 0.1125 - accuracy: 0.9658 - val_loss: 0.1094 - val_accuracy: 0.9678 Epoch 4/20 540/540 - 7s - loss: 0.0883 - accuracy: 0.9729 - val_loss: 0.1131 - val_accuracy: 0.9647 Epoch 5/20 540/540 - 8s - loss: 0.0763 - accuracy: 0.9770 - val_loss: 0.0848 - val_accuracy: 0.9758 Epoch 6/20 540/540 - 10s - loss: 0.0637 - accuracy: 0.9803 - val_loss: 0.0825 - val_accuracy: 0.9745 Epoch 7/20 540/540 - 11s - loss: 0.0556 - accuracy: 0.9824 - val_loss: 0.0773 - val_accuracy: 0.9747 Epoch 8/20 540/540 - 16s - loss: 0.0460 - accuracy: 0.9855 - val_loss: 0.0775 - val_accuracy: 0.9758 Epoch 9/20 540/540 - 14s - loss: 0.0441 - accuracy: 0.9856 - val_loss: 0.0640 - val_accuracy: 0.9803 Epoch 10/20 540/540 - 10s - loss: 0.0384 - accuracy: 0.9876 - val_loss: 0.0713 - val_accuracy: 0.9777 Epoch 11/20 540/540 - 15s - loss: 0.0333 - accuracy: 0.9889 - val_loss: 0.0563 - val_accuracy: 0.9827 Epoch 12/20 540/540 - 15s - loss: 0.0294 - accuracy: 0.9907 - val_loss: 0.0568 - val_accuracy: 0.9820 Epoch 13/20 540/540 - 14s - loss: 0.0307 - accuracy: 0.9904 - val_loss: 0.0531 - val_accuracy: 0.9838 Epoch 14/20 540/540 - 15s - loss: 0.0301 - accuracy: 0.9907 - val_loss: 0.0553 - val_accuracy: 0.9832 Epoch 15/20 540/540 - 14s - loss: 0.0224 - accuracy: 0.9934 - val_loss: 0.0436 - val_accuracy: 0.9863 Epoch 16/20 540/540 - 14s - loss: 0.0219 - accuracy: 0.9931 - val_loss: 0.0396 - val_accuracy: 0.9885 Epoch 17/20 540/540 - 14s - loss: 0.0194 - accuracy: 0.9939 - val_loss: 0.0284 - val_accuracy: 0.9920 Epoch 18/20 540/540 - 15s - loss: 0.0216 - accuracy: 0.9934 - val_loss: 0.0396 - val_accuracy: 0.9865 Epoch 19/20 540/540 - 16s - loss: 0.0211 - accuracy: 0.9929 - val_loss: 0.0362 - val_accuracy: 0.9888 testing model with hidden layer size of 64 and activation functions of relu 1/1 [==============================] - 2s 2s/step - loss: 0.1157 - accuracy: 0.9767 Epoch 1/20 540/540 - 15s - loss: 1.2758 - accuracy: 0.6056 - val_loss: 0.5857 - val_accuracy: 0.8503 Epoch 2/20 540/540 - 12s - loss: 0.4123 - accuracy: 0.8948 - val_loss: 0.3039 - val_accuracy: 0.9202 Epoch 3/20 540/540 - 14s - loss: 0.2517 - accuracy: 0.9332 - val_loss: 0.2242 - val_accuracy: 0.9397 Epoch 4/20 540/540 - 8s - loss: 0.1914 - accuracy: 0.9482 - val_loss: 0.1874 - val_accuracy: 0.9490 Epoch 5/20 540/540 - 14s - loss: 0.1569 - accuracy: 0.9574 - val_loss: 0.1590 - val_accuracy: 0.9573 Epoch 6/20 540/540 - 16s - loss: 0.1314 - accuracy: 0.9636 - val_loss: 0.1357 - val_accuracy: 0.9623 Epoch 7/20 540/540 - 14s - loss: 0.1153 - accuracy: 0.9677 - val_loss: 0.1251 - val_accuracy: 0.9628 Epoch 8/20 540/540 - 10s - loss: 0.1000 - accuracy: 0.9724 - val_loss: 0.1141 - val_accuracy: 0.9677 Epoch 9/20 540/540 - 14s - loss: 0.0886 - accuracy: 0.9748 - val_loss: 0.1076 - val_accuracy: 0.9690 Epoch 10/20 540/540 - 12s - loss: 0.0796 - accuracy: 0.9772 - val_loss: 0.1066 - val_accuracy: 0.9678 Epoch 11/20 540/540 - 11s - loss: 0.0726 - accuracy: 0.9798 - val_loss: 0.0981 - val_accuracy: 0.9732 Epoch 12/20 540/540 - 12s - loss: 0.0658 - accuracy: 0.9810 - val_loss: 0.0933 - val_accuracy: 0.9725 Epoch 13/20 540/540 - 8s - loss: 0.0592 - accuracy: 0.9838 - val_loss: 0.0897 - val_accuracy: 0.9740 Epoch 14/20 540/540 - 10s - loss: 0.0560 - accuracy: 0.9838 - val_loss: 0.0871 - val_accuracy: 0.9755 Epoch 15/20 540/540 - 14s - loss: 0.0511 - accuracy: 0.9858 - val_loss: 0.0730 - val_accuracy: 0.9805 Epoch 16/20 540/540 - 13s - loss: 0.0457 - accuracy: 0.9875 - val_loss: 0.0710 - val_accuracy: 0.9803 Epoch 17/20 540/540 - 13s - loss: 0.0444 - accuracy: 0.9876 - val_loss: 0.0743 - val_accuracy: 0.9800 Epoch 18/20 540/540 - 13s - loss: 0.0386 - accuracy: 0.9896 - val_loss: 0.0684 - val_accuracy: 0.9800 Epoch 19/20 540/540 - 12s - loss: 0.0373 - accuracy: 0.9893 - val_loss: 0.0638 - val_accuracy: 0.9818 Epoch 20/20 540/540 - 11s - loss: 0.0342 - accuracy: 0.9909 - val_loss: 0.0600 - val_accuracy: 0.9850 testing model with hidden layer size of 64 and activation functions of sigmoid 1/1 [==============================] - 2s 2s/step - loss: 0.1376 - accuracy: 0.9660 Epoch 1/20 540/540 - 12s - loss: 0.3476 - accuracy: 0.9005 - val_loss: 0.1901 - val_accuracy: 0.9425 Epoch 2/20 540/540 - 11s - loss: 0.1532 - accuracy: 0.9542 - val_loss: 0.1361 - val_accuracy: 0.9577 Epoch 3/20 540/540 - 11s - loss: 0.1126 - accuracy: 0.9663 - val_loss: 0.1147 - val_accuracy: 0.9633 Epoch 4/20 540/540 - 12s - loss: 0.0868 - accuracy: 0.9730 - val_loss: 0.0997 - val_accuracy: 0.9690 Epoch 5/20 540/540 - 11s - loss: 0.0707 - accuracy: 0.9774 - val_loss: 0.0831 - val_accuracy: 0.9737 Epoch 6/20 540/540 - 13s - loss: 0.0594 - accuracy: 0.9812 - val_loss: 0.0735 - val_accuracy: 0.9783 Epoch 7/20 540/540 - 13s - loss: 0.0495 - accuracy: 0.9844 - val_loss: 0.0801 - val_accuracy: 0.9760 Epoch 8/20 540/540 - 14s - loss: 0.0425 - accuracy: 0.9868 - val_loss: 0.0624 - val_accuracy: 0.9805 Epoch 9/20 540/540 - 11s - loss: 0.0354 - accuracy: 0.9890 - val_loss: 0.0618 - val_accuracy: 0.9823 Epoch 10/20 540/540 - 10s - loss: 0.0331 - accuracy: 0.9898 - val_loss: 0.0677 - val_accuracy: 0.9790 Epoch 11/20 540/540 - 10s - loss: 0.0280 - accuracy: 0.9910 - val_loss: 0.0445 - val_accuracy: 0.9865 Epoch 12/20 540/540 - 12s - loss: 0.0275 - accuracy: 0.9912 - val_loss: 0.0439 - val_accuracy: 0.9867 Epoch 13/20 540/540 - 11s - loss: 0.0226 - accuracy: 0.9928 - val_loss: 0.0550 - val_accuracy: 0.9820 Epoch 14/20 540/540 - 12s - loss: 0.0228 - accuracy: 0.9925 - val_loss: 0.0401 - val_accuracy: 0.9877 Epoch 15/20 540/540 - 15s - loss: 0.0195 - accuracy: 0.9937 - val_loss: 0.0369 - val_accuracy: 0.9883 Epoch 16/20 540/540 - 9s - loss: 0.0180 - accuracy: 0.9940 - val_loss: 0.0383 - val_accuracy: 0.9882 Epoch 17/20 540/540 - 13s - loss: 0.0180 - accuracy: 0.9946 - val_loss: 0.0268 - val_accuracy: 0.9918 Epoch 18/20 540/540 - 13s - loss: 0.0132 - accuracy: 0.9954 - val_loss: 0.0259 - val_accuracy: 0.9918 Epoch 19/20 540/540 - 10s - loss: 0.0165 - accuracy: 0.9949 - val_loss: 0.0257 - val_accuracy: 0.9937 Epoch 20/20 540/540 - 10s - loss: 0.0123 - accuracy: 0.9964 - val_loss: 0.0169 - val_accuracy: 0.9960 testing model with hidden layer size of 64 and activation functions of tanh 1/1 [==============================] - 2s 2s/step - loss: 0.1036 - accuracy: 0.9759
n4 = len(all_width4)
test_id4 = np.arange(1, n4+1, 1)
df4 = pd.DataFrame(data=test_id4, columns=["Test_ID"])
df4["Width"] = all_width4
df4["Activation_Func"] = all_func4
df4["Test_Loss"] = all_loss4
df4["Test_Accuracy"] = all_acc4
df4.sort_values(by="Test_Accuracy")
Test_ID | Width | Activation_Func | Test_Loss | Test_Accuracy | |
---|---|---|---|---|---|
1 | 2 | 4 | sigmoid | 1.148699 | 0.5601 |
2 | 3 | 4 | tanh | 0.670961 | 0.7723 |
0 | 1 | 4 | relu | 0.731848 | 0.7895 |
4 | 5 | 8 | sigmoid | 0.541442 | 0.8640 |
3 | 4 | 8 | relu | 0.302695 | 0.9139 |
5 | 6 | 8 | tanh | 0.261323 | 0.9277 |
7 | 8 | 16 | sigmoid | 0.235728 | 0.9399 |
6 | 7 | 16 | relu | 0.178691 | 0.9491 |
8 | 9 | 16 | tanh | 0.174120 | 0.9509 |
10 | 11 | 32 | sigmoid | 0.176028 | 0.9526 |
9 | 10 | 32 | relu | 0.138972 | 0.9631 |
11 | 12 | 32 | tanh | 0.131395 | 0.9654 |
13 | 14 | 64 | sigmoid | 0.137561 | 0.9660 |
14 | 15 | 64 | tanh | 0.103609 | 0.9759 |
12 | 13 | 64 | relu | 0.115719 | 0.9767 |
fig, ax = plt.subplots(figsize=(4, 4), dpi=100)
ax.set(xscale="log")
sns.scatterplot(data=df4, x='Width', y='Test_Accuracy', hue='Activation_Func', style='Activation_Func');
We achieved a max classification accuracy of 97.7% this time.
The activation functions of "relu" and "tanh" are better than "sigmoid" for every test. We can increase the width again and use one of these functions to see if the accuracy improves.
func = 'relu'
width = 128
# model definition
model = tf.keras.Sequential([
tf.keras.layers.Flatten(input_shape=input_size), # input layer
tf.keras.layers.Dense(width, activation=func), # 1st hidden layer
tf.keras.layers.Dense(width, activation=func), # 2nd hidden layer
tf.keras.layers.Dense(width, activation=func), # 3rd hidden layer
tf.keras.layers.Dense(width, activation=func), # 4th hidden layer
tf.keras.layers.Dense(output_size, activation='softmax') # output layer
])
# optimizer, loss function, and maetrics
model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
# training
model.fit(train_data,
epochs=NUM_EPOCHS,
callbacks=early_stopping,
validation_data=(validation_inputs, validation_targets),
verbose=2)
# testing
print("testing model with hidden layer size of {x} and activation functions of {y}".format(x=width, y=func))
test_loss, test_acc = model.evaluate(test_data)
Epoch 1/20 540/540 - 3s - loss: 0.3043 - accuracy: 0.9087 - val_loss: 0.1557 - val_accuracy: 0.9562 Epoch 2/20 540/540 - 2s - loss: 0.1184 - accuracy: 0.9640 - val_loss: 0.1044 - val_accuracy: 0.9705 Epoch 3/20 540/540 - 2s - loss: 0.0802 - accuracy: 0.9752 - val_loss: 0.0850 - val_accuracy: 0.9738 Epoch 4/20 540/540 - 2s - loss: 0.0638 - accuracy: 0.9801 - val_loss: 0.0721 - val_accuracy: 0.9780 Epoch 5/20 540/540 - 2s - loss: 0.0500 - accuracy: 0.9839 - val_loss: 0.0870 - val_accuracy: 0.9748 Epoch 6/20 540/540 - 2s - loss: 0.0421 - accuracy: 0.9863 - val_loss: 0.0624 - val_accuracy: 0.9810 Epoch 7/20 540/540 - 2s - loss: 0.0369 - accuracy: 0.9879 - val_loss: 0.0584 - val_accuracy: 0.9817 Epoch 8/20 540/540 - 2s - loss: 0.0318 - accuracy: 0.9896 - val_loss: 0.0537 - val_accuracy: 0.9827 Epoch 9/20 540/540 - 2s - loss: 0.0279 - accuracy: 0.9907 - val_loss: 0.0688 - val_accuracy: 0.9795 Epoch 10/20 540/540 - 2s - loss: 0.0212 - accuracy: 0.9934 - val_loss: 0.0626 - val_accuracy: 0.9833 testing model with hidden layer size of 128 and activation functions of relu 1/1 [==============================] - 0s 308ms/step - loss: 0.1067 - accuracy: 0.9752
Testing Accuracy: 97.5%
func = 'relu'
width = 256
# model definition
model = tf.keras.Sequential([
tf.keras.layers.Flatten(input_shape=input_size), # input layer
tf.keras.layers.Dense(width, activation=func), # 1st hidden layer
tf.keras.layers.Dense(width, activation=func), # 2nd hidden layer
tf.keras.layers.Dense(width, activation=func), # 3rd hidden layer
tf.keras.layers.Dense(width, activation=func), # 4th hidden layer
tf.keras.layers.Dense(output_size, activation='softmax') # output layer
])
# optimizer, loss function, and maetrics
model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
# training
model.fit(train_data,
epochs=NUM_EPOCHS,
callbacks=early_stopping,
validation_data=(validation_inputs, validation_targets),
verbose=2);
# testing
print("testing model with hidden layer size of {x} and activation functions of {y}".format(x=width, y=func))
test_loss, test_acc = model.evaluate(test_data)
Epoch 1/20 540/540 - 3s - loss: 0.2500 - accuracy: 0.9233 - val_loss: 0.1385 - val_accuracy: 0.9562 Epoch 2/20 540/540 - 3s - loss: 0.0986 - accuracy: 0.9700 - val_loss: 0.1118 - val_accuracy: 0.9658 Epoch 3/20 540/540 - 3s - loss: 0.0679 - accuracy: 0.9791 - val_loss: 0.0761 - val_accuracy: 0.9773 Epoch 4/20 540/540 - 3s - loss: 0.0523 - accuracy: 0.9836 - val_loss: 0.0734 - val_accuracy: 0.9795 Epoch 5/20 540/540 - 3s - loss: 0.0396 - accuracy: 0.9876 - val_loss: 0.0628 - val_accuracy: 0.9820 Epoch 6/20 540/540 - 3s - loss: 0.0390 - accuracy: 0.9877 - val_loss: 0.0541 - val_accuracy: 0.9822 Epoch 7/20 540/540 - 3s - loss: 0.0335 - accuracy: 0.9895 - val_loss: 0.0484 - val_accuracy: 0.9848 Epoch 8/20 540/540 - 3s - loss: 0.0265 - accuracy: 0.9920 - val_loss: 0.0534 - val_accuracy: 0.9840 Epoch 9/20 540/540 - 3s - loss: 0.0254 - accuracy: 0.9921 - val_loss: 0.0452 - val_accuracy: 0.9877 Epoch 10/20 540/540 - 3s - loss: 0.0209 - accuracy: 0.9934 - val_loss: 0.0403 - val_accuracy: 0.9863 Epoch 11/20 540/540 - 3s - loss: 0.0235 - accuracy: 0.9927 - val_loss: 0.0361 - val_accuracy: 0.9902 Epoch 12/20 540/540 - 3s - loss: 0.0212 - accuracy: 0.9938 - val_loss: 0.0430 - val_accuracy: 0.9867 Epoch 13/20 540/540 - 3s - loss: 0.0170 - accuracy: 0.9949 - val_loss: 0.0355 - val_accuracy: 0.9893 Epoch 14/20 540/540 - 3s - loss: 0.0144 - accuracy: 0.9955 - val_loss: 0.0264 - val_accuracy: 0.9917 Epoch 15/20 540/540 - 3s - loss: 0.0196 - accuracy: 0.9940 - val_loss: 0.0264 - val_accuracy: 0.9920 Epoch 16/20 540/540 - 3s - loss: 0.0142 - accuracy: 0.9954 - val_loss: 0.0259 - val_accuracy: 0.9928 Epoch 17/20 540/540 - 3s - loss: 0.0127 - accuracy: 0.9962 - val_loss: 0.0276 - val_accuracy: 0.9917 Epoch 18/20 540/540 - 3s - loss: 0.0130 - accuracy: 0.9960 - val_loss: 0.0293 - val_accuracy: 0.9918 testing model with hidden layer size of 256 and activation functions of relu 1/1 [==============================] - 0s 390ms/step - loss: 0.1055 - accuracy: 0.9813
Testing Accuracy: 98.1%
Depth = 8
From the results above, it seems that we can remove the sigmoid activation function and exclude the hidden layer size of 4 and include a width of 128.
# HYPERPARAMETERS
NUM_EPOCHS = 20
# activation functions
act_func = ['relu', 'tanh']
# we will try to find the best hidden layer size from this list of options:
hidden_layer_sizes = []
print("We will consider the following hidden layer sizes (width):")
for n in range (3, 8):
width = 2 ** n
print(width)
hidden_layer_sizes.append(int(width))
We will consider the following hidden layer sizes (width): 8 16 32 64 128
all_width8 = []
all_func8 = []
all_loss8 = []
all_acc8 = []
for width in hidden_layer_sizes:
for func in act_func:
# store the parameters
all_width8.append(width)
all_func8.append(func)
# model definition
model = tf.keras.Sequential([
tf.keras.layers.Flatten(input_shape=input_size), # input layer
tf.keras.layers.Dense(width, activation=func), # 1st hidden layer
tf.keras.layers.Dense(width, activation=func), # 2nd hidden layer
tf.keras.layers.Dense(width, activation=func), # 3rd hidden layer
tf.keras.layers.Dense(width, activation=func), # 4th hidden layer
tf.keras.layers.Dense(width, activation=func), # 5
tf.keras.layers.Dense(width, activation=func), # 6
tf.keras.layers.Dense(width, activation=func), # 7
tf.keras.layers.Dense(width, activation=func), # 8
tf.keras.layers.Dense(output_size, activation='softmax') # output layer
])
# optimizer, loss function, and maetrics
model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
# training
model.fit(train_data,
epochs=NUM_EPOCHS,
callbacks=early_stopping,
validation_data=(validation_inputs, validation_targets),
verbose=0)
# testing
print("testing model with hidden layer size of {x} and activation functions of {y}".format(x=width, y=func))
test_loss, test_acc = model.evaluate(test_data)
all_loss8.append(test_loss)
all_acc8.append(test_acc)
testing model with hidden layer size of 8 and activation functions of relu 1/1 [==============================] - 0s 352ms/step - loss: 0.3723 - accuracy: 0.8932 testing model with hidden layer size of 8 and activation functions of tanh 1/1 [==============================] - 0s 318ms/step - loss: 0.3045 - accuracy: 0.9180 testing model with hidden layer size of 16 and activation functions of relu 1/1 [==============================] - 0s 319ms/step - loss: 0.2219 - accuracy: 0.9405 testing model with hidden layer size of 16 and activation functions of tanh 1/1 [==============================] - 1s 537ms/step - loss: 0.1677 - accuracy: 0.9539 testing model with hidden layer size of 32 and activation functions of relu 1/1 [==============================] - 1s 679ms/step - loss: 0.1261 - accuracy: 0.9637 testing model with hidden layer size of 32 and activation functions of tanh 1/1 [==============================] - 0s 374ms/step - loss: 0.1280 - accuracy: 0.9649 testing model with hidden layer size of 64 and activation functions of relu 1/1 [==============================] - 0s 322ms/step - loss: 0.1149 - accuracy: 0.9718 testing model with hidden layer size of 64 and activation functions of tanh 1/1 [==============================] - 1s 518ms/step - loss: 0.1260 - accuracy: 0.9688 testing model with hidden layer size of 128 and activation functions of relu 1/1 [==============================] - 0s 399ms/step - loss: 0.1069 - accuracy: 0.9744 testing model with hidden layer size of 128 and activation functions of tanh 1/1 [==============================] - 1s 582ms/step - loss: 0.1215 - accuracy: 0.9715
Randomize the Activation Functions
# create a function to randomize the activation functions used in the deep NN
act_func = ['relu', 'sigmoid', 'tanh']
depth = 8
func_list = []
for layer in range(0, depth):
fi = np.random.randint(low=0, high=3)
func = act_func[fi]
print(func)
func_list.append(func)
relu tanh sigmoid sigmoid tanh relu tanh sigmoid
# HYPERPARAMETERS
NUM_EPOCHS = 20
# activation functions
act_func = ['relu', 'tanh', 'sigmoid']
# we will try to find the best hidden layer size from this list of options:
hidden_layer_sizes = []
print("We will consider the following hidden layer sizes (width):")
for n in range (4, 8):
width = 2 ** n
print(width)
hidden_layer_sizes.append(int(width))
We will consider the following hidden layer sizes (width): 16 32 64 128
all_width_rand = []
all_func_rand = []
all_loss_rand = []
all_acc_rand = []
depth = 8
for width in hidden_layer_sizes:
for i in range(0, 5):
# generate random order of functions
func_list = []
for layer in range(0, depth):
fi = np.random.randint(low=0, high=3)
func = act_func[fi]
func_list.append(func)
# store the parameters
all_width_rand.append(width)
all_func_rand.append(func_list)
# model definition
model = tf.keras.Sequential([
tf.keras.layers.Flatten(input_shape=input_size), # input layer
tf.keras.layers.Dense(width, activation=func_list[0]), # 1st hidden layer
tf.keras.layers.Dense(width, activation=func_list[1]), # 2nd hidden layer
tf.keras.layers.Dense(width, activation=func_list[2]), # 3rd hidden layer
tf.keras.layers.Dense(width, activation=func_list[3]), # 4th hidden layer
tf.keras.layers.Dense(width, activation=func_list[4]), # 5
tf.keras.layers.Dense(width, activation=func_list[5]), # 6
tf.keras.layers.Dense(width, activation=func_list[6]), # 7
tf.keras.layers.Dense(width, activation=func_list[7]), # 8
tf.keras.layers.Dense(output_size, activation='softmax') # output layer
])
# optimizer, loss function, and maetrics
model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
# training
model.fit(train_data,
epochs=NUM_EPOCHS,
callbacks=early_stopping,
validation_data=(validation_inputs, validation_targets),
verbose=0)
# testing
print("testing model with hidden layer size of {x} and activation functions of {y}".format(x=width, y=func))
test_loss, test_acc = model.evaluate(test_data)
all_loss_rand.append(test_loss)
all_acc_rand.append(test_acc)
testing model with hidden layer size of 16 and activation functions of tanh 1/1 [==============================] - 0s 389ms/step - loss: 0.2801 - accuracy: 0.9276 testing model with hidden layer size of 16 and activation functions of tanh 1/1 [==============================] - 0s 421ms/step - loss: 0.2791 - accuracy: 0.9292 testing model with hidden layer size of 16 and activation functions of tanh 1/1 [==============================] - 0s 348ms/step - loss: 0.1973 - accuracy: 0.9457 testing model with hidden layer size of 16 and activation functions of sigmoid 1/1 [==============================] - 0s 455ms/step - loss: 0.2175 - accuracy: 0.9436 testing model with hidden layer size of 16 and activation functions of relu 1/1 [==============================] - 1s 581ms/step - loss: 0.2927 - accuracy: 0.9213 testing model with hidden layer size of 32 and activation functions of sigmoid 1/1 [==============================] - 1s 511ms/step - loss: 0.2998 - accuracy: 0.9226 testing model with hidden layer size of 32 and activation functions of relu 1/1 [==============================] - 0s 443ms/step - loss: 0.1394 - accuracy: 0.9639 testing model with hidden layer size of 32 and activation functions of relu 1/1 [==============================] - 0s 378ms/step - loss: 0.2340 - accuracy: 0.9462 testing model with hidden layer size of 32 and activation functions of sigmoid 1/1 [==============================] - 0s 395ms/step - loss: 0.1584 - accuracy: 0.9601 testing model with hidden layer size of 32 and activation functions of tanh 1/1 [==============================] - 0s 405ms/step - loss: 0.1521 - accuracy: 0.9594 testing model with hidden layer size of 64 and activation functions of tanh 1/1 [==============================] - 0s 310ms/step - loss: 0.1687 - accuracy: 0.9615 testing model with hidden layer size of 64 and activation functions of relu 1/1 [==============================] - 0s 366ms/step - loss: 0.1183 - accuracy: 0.9732 testing model with hidden layer size of 64 and activation functions of relu 1/1 [==============================] - 1s 595ms/step - loss: 0.1676 - accuracy: 0.9563 testing model with hidden layer size of 64 and activation functions of sigmoid 1/1 [==============================] - 0s 432ms/step - loss: 0.1549 - accuracy: 0.9642 testing model with hidden layer size of 64 and activation functions of relu 1/1 [==============================] - 0s 433ms/step - loss: 0.1400 - accuracy: 0.9647 testing model with hidden layer size of 128 and activation functions of relu 1/1 [==============================] - 0s 455ms/step - loss: 0.1461 - accuracy: 0.9613 testing model with hidden layer size of 128 and activation functions of tanh 1/1 [==============================] - 1s 576ms/step - loss: 0.1105 - accuracy: 0.9703 testing model with hidden layer size of 128 and activation functions of relu 1/1 [==============================] - 1s 550ms/step - loss: 0.1521 - accuracy: 0.9668 testing model with hidden layer size of 128 and activation functions of tanh 1/1 [==============================] - 0s 470ms/step - loss: 0.1107 - accuracy: 0.9754 testing model with hidden layer size of 128 and activation functions of sigmoid 1/1 [==============================] - 0s 394ms/step - loss: 0.1199 - accuracy: 0.9724
n8 = len(all_width_rand)
test_id8 = np.arange(1, n8+1, 1)
df8 = pd.DataFrame(data=test_id8, columns=["Test_ID"])
df8["Width"] = all_width_rand
df8["Activation_Func"] = all_func_rand
df8["Test_Loss"] = all_loss_rand
df8["Test_Accuracy"] = all_acc_rand
df8.sort_values(by="Test_Accuracy")
Test_ID | Width | Activation_Func | Test_Loss | Test_Accuracy | |
---|---|---|---|---|---|
4 | 5 | 16 | [relu, sigmoid, tanh, relu, sigmoid, relu, tan... | 0.292747 | 0.9213 |
5 | 6 | 32 | [sigmoid, sigmoid, sigmoid, tanh, relu, relu, ... | 0.299809 | 0.9226 |
0 | 1 | 16 | [sigmoid, relu, sigmoid, tanh, relu, relu, rel... | 0.280093 | 0.9276 |
1 | 2 | 16 | [tanh, relu, sigmoid, tanh, sigmoid, tanh, tan... | 0.279057 | 0.9292 |
3 | 4 | 16 | [relu, relu, relu, relu, sigmoid, tanh, tanh, ... | 0.217534 | 0.9436 |
2 | 3 | 16 | [relu, relu, relu, tanh, sigmoid, relu, relu, ... | 0.197313 | 0.9457 |
7 | 8 | 32 | [tanh, relu, sigmoid, sigmoid, sigmoid, tanh, ... | 0.233974 | 0.9462 |
12 | 13 | 64 | [tanh, sigmoid, relu, relu, sigmoid, tanh, sig... | 0.167599 | 0.9563 |
9 | 10 | 32 | [tanh, relu, sigmoid, relu, tanh, tanh, relu, ... | 0.152140 | 0.9594 |
8 | 9 | 32 | [relu, relu, relu, sigmoid, tanh, relu, tanh, ... | 0.158423 | 0.9601 |
15 | 16 | 128 | [tanh, sigmoid, sigmoid, relu, tanh, relu, sig... | 0.146052 | 0.9613 |
10 | 11 | 64 | [tanh, tanh, tanh, relu, sigmoid, sigmoid, sig... | 0.168668 | 0.9615 |
6 | 7 | 32 | [tanh, tanh, tanh, tanh, tanh, tanh, sigmoid, ... | 0.139386 | 0.9639 |
13 | 14 | 64 | [relu, sigmoid, tanh, sigmoid, tanh, relu, rel... | 0.154920 | 0.9642 |
14 | 15 | 64 | [tanh, relu, sigmoid, tanh, relu, tanh, tanh, ... | 0.139984 | 0.9647 |
17 | 18 | 128 | [tanh, tanh, sigmoid, relu, sigmoid, sigmoid, ... | 0.152139 | 0.9668 |
16 | 17 | 128 | [sigmoid, tanh, tanh, tanh, sigmoid, tanh, rel... | 0.110507 | 0.9703 |
19 | 20 | 128 | [relu, relu, sigmoid, tanh, sigmoid, tanh, rel... | 0.119908 | 0.9724 |
11 | 12 | 64 | [relu, relu, sigmoid, sigmoid, tanh, sigmoid, ... | 0.118340 | 0.9732 |
18 | 19 | 128 | [relu, tanh, tanh, relu, tanh, sigmoid, sigmoi... | 0.110655 | 0.9754 |
By randomly sampling the possible options for the activation functions and using a depth of 8, the best test accuracy was 97.5% for a hidden layer width of 128. We will do one more model with random and without random sampling of the activation functions but this time with a width of 256.
depth = 8
width = 256
act_func = ['relu', 'tanh'] # remove sigmoid
for i in range(0, 3):
# generate random order of functions
func_list = []
for layer in range(0, depth):
fi = np.random.randint(low=0, high=2)
func = act_func[fi]
func_list.append(func)
# model definition
model = tf.keras.Sequential([
tf.keras.layers.Flatten(input_shape=input_size), # input layer
tf.keras.layers.Dense(width, activation=func_list[0]), # 1st hidden layer
tf.keras.layers.Dense(width, activation=func_list[1]), # 2nd hidden layer
tf.keras.layers.Dense(width, activation=func_list[2]), # 3rd hidden layer
tf.keras.layers.Dense(width, activation=func_list[3]), # 4th hidden layer
tf.keras.layers.Dense(width, activation=func_list[4]), # 5
tf.keras.layers.Dense(width, activation=func_list[5]), # 6
tf.keras.layers.Dense(width, activation=func_list[6]), # 7
tf.keras.layers.Dense(width, activation=func_list[7]), # 8
tf.keras.layers.Dense(output_size, activation='softmax') # output layer
])
# optimizer, loss function, and maetrics
model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
# training
model.fit(train_data,
epochs=NUM_EPOCHS,
callbacks=early_stopping,
validation_data=(validation_inputs, validation_targets),
verbose=0)
# testing
print("testing model with hidden layer size of {x} and activation functions of {y}".format(x=width, y=func))
test_loss, test_acc = model.evaluate(test_data)
testing model with hidden layer size of 256 and activation functions of relu 1/1 [==============================] - 0s 464ms/step - loss: 0.1324 - accuracy: 0.9665 testing model with hidden layer size of 256 and activation functions of tanh 1/1 [==============================] - 0s 478ms/step - loss: 0.1142 - accuracy: 0.9727 testing model with hidden layer size of 256 and activation functions of tanh 1/1 [==============================] - 0s 445ms/step - loss: 0.1219 - accuracy: 0.9659
# specify the optimal parameters
width = 256
func = 'relu'
# model definition
model = tf.keras.Sequential([
tf.keras.layers.Flatten(input_shape=input_size), # input layer
tf.keras.layers.Dense(width, activation=func), # 1st hidden layer
tf.keras.layers.Dense(width, activation=func), # 2nd hidden layer
tf.keras.layers.Dense(width, activation=func), # 3rd hidden layer
tf.keras.layers.Dense(width, activation=func), # 4th hidden layer
tf.keras.layers.Dense(width, activation=func), # 5
tf.keras.layers.Dense(width, activation=func), # 6
tf.keras.layers.Dense(width, activation=func), # 7
tf.keras.layers.Dense(width, activation=func), # 8
tf.keras.layers.Dense(output_size, activation='softmax') # output layer
])
# optimizer, loss function, and maetrics
model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
# training
model.fit(train_data,
epochs=NUM_EPOCHS,
callbacks=early_stopping,
validation_data=(validation_inputs, validation_targets),
verbose=2);
# testing
print("testing model with hidden layer size of {x} and activation functions of {y}".format(x=width, y=func))
test_loss, test_acc = model.evaluate(test_data)
Epoch 1/20 540/540 - 8s - loss: 0.2782 - accuracy: 0.9141 - val_loss: 0.1249 - val_accuracy: 0.9602 Epoch 2/20 540/540 - 7s - loss: 0.1229 - accuracy: 0.9645 - val_loss: 0.1264 - val_accuracy: 0.9608 Epoch 3/20 540/540 - 7s - loss: 0.0916 - accuracy: 0.9744 - val_loss: 0.1109 - val_accuracy: 0.9692 Epoch 4/20 540/540 - 7s - loss: 0.0717 - accuracy: 0.9801 - val_loss: 0.1070 - val_accuracy: 0.9692 Epoch 5/20 540/540 - 7s - loss: 0.0604 - accuracy: 0.9835 - val_loss: 0.1129 - val_accuracy: 0.9718 Epoch 6/20 540/540 - 6s - loss: 0.0547 - accuracy: 0.9848 - val_loss: 0.0640 - val_accuracy: 0.9825 Epoch 7/20 540/540 - 7s - loss: 0.0479 - accuracy: 0.9870 - val_loss: 0.0690 - val_accuracy: 0.9807 Epoch 8/20 540/540 - 7s - loss: 0.0418 - accuracy: 0.9890 - val_loss: 0.0869 - val_accuracy: 0.9785 testing model with hidden layer size of 256 and activation functions of relu 1/1 [==============================] - 1s 653ms/step - loss: 0.1188 - accuracy: 0.9709
# specify the optimal parameters
width = 256
func = 'tanh'
# model definition
model = tf.keras.Sequential([
tf.keras.layers.Flatten(input_shape=input_size), # input layer
tf.keras.layers.Dense(width, activation=func), # 1st hidden layer
tf.keras.layers.Dense(width, activation=func), # 2nd hidden layer
tf.keras.layers.Dense(width, activation=func), # 3rd hidden layer
tf.keras.layers.Dense(width, activation=func), # 4th hidden layer
tf.keras.layers.Dense(width, activation=func), # 5
tf.keras.layers.Dense(width, activation=func), # 6
tf.keras.layers.Dense(width, activation=func), # 7
tf.keras.layers.Dense(width, activation=func), # 8
tf.keras.layers.Dense(output_size, activation='softmax') # output layer
])
# optimizer, loss function, and maetrics
model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
# training
model.fit(train_data,
epochs=NUM_EPOCHS,
callbacks=early_stopping,
validation_data=(validation_inputs, validation_targets),
verbose=2);
# testing
print("testing model with hidden layer size of {x} and activation functions of {y}".format(x=width, y=func))
test_loss, test_acc = model.evaluate(test_data)
Epoch 1/20 540/540 - 5s - loss: 0.3137 - accuracy: 0.9052 - val_loss: 0.2344 - val_accuracy: 0.9320 Epoch 2/20 540/540 - 4s - loss: 0.1777 - accuracy: 0.9485 - val_loss: 0.1608 - val_accuracy: 0.9550 Epoch 3/20 540/540 - 5s - loss: 0.1310 - accuracy: 0.9613 - val_loss: 0.1448 - val_accuracy: 0.9582 Epoch 4/20 540/540 - 7s - loss: 0.1107 - accuracy: 0.9673 - val_loss: 0.1101 - val_accuracy: 0.9700 Epoch 5/20 540/540 - 7s - loss: 0.0926 - accuracy: 0.9721 - val_loss: 0.1193 - val_accuracy: 0.9683 Epoch 6/20 540/540 - 7s - loss: 0.0764 - accuracy: 0.9768 - val_loss: 0.1325 - val_accuracy: 0.9632 testing model with hidden layer size of 256 and activation functions of tanh 1/1 [==============================] - 1s 691ms/step - loss: 0.1414 - accuracy: 0.9606