Comet.ml Confusion Matrix

This page is available as an executable or viewable Jupyter Notebook:


Comet.ml can generate a variety of visualizations, including line charts, scatter charts, bar charts, and histograms. This notebook explores Comet's confusion matrix chart.

Setup

The first thing we'll do in this notebook tutorial is install comet_ml and other items that we'll need for this demonstration. That will include keras, tensorflow, and numpy.

First, comet_ml (you may want to do this slightly differently on your computer):

In [ ]:
%pip install --upgrade --upgrade-strategy eager --user comet_ml 

And now tensorflow, keras, and numpy:

In [ ]:
%pip install --upgrade --upgrade-strategy eager --user keras tensorflow numpy

As the output may suggest, if anything got updated, it might be a good idea to restart the kernel and continue from here.

Comet Configuration

To run the following experiments, you'll need to set your COMET_API_KEY. The easiest way to to this is to set the values in a cell like this:

import comet_ml

comet_ml.save(api_key="...")

where you replace the ...'s with your key.

You can get your COMET_API_KEY under your quickstart link (replace YOUR_USERNAME with your Comet.ml username):

https://www.comet.ml/YOUR_USERNAME/quickstart

Example 1: Simple Confusion Matrix

First, we will create an experiment:

In [1]:
from comet_ml import Experiment

We're not interested at the moment in logging environment details or the code and related items, so I'll not log those:

In [2]:
experiment = Experiment(project_name="confusion-matrix", log_env_details=False, log_code=False)
COMET INFO: Experiment is live on comet.ml https://www.comet.ml/cometpublic/confusion-matrix/0d1a88a5038b4507a31061039c269f36

As a simple example, let's consider that we have these six patterns that are our output targets (desired output):

In [3]:
desired_output = [
    [1, 0, 0],
    [0, 1, 0],
    [0, 0, 1],
    [1, 0, 0],
    [0, 1, 0],
    [0, 0, 1],
 ]

Imagine that this is a classification task where each target (desired output) is composed of three output values, with one unit "on" (set to 1) and the others "off" (set to 0). This is sometimes called a "one-hot" representation and is a common way of representing categories. There are 6 patterns, where there are 2 each for category.

Now, let's make up some sample data that an model might produce. Let's say initially that the output is pretty random and doesn't even add up to 1 for each row. This may be unrealistic as many such classification tasks might use an error/loss output metric that is based on cross entropy which would make the sum of values closer to 1. That might be desirable, but is not required for our example here.

In [4]:
actual_output = [
    [0.1, 0.5, 0.4],
    [0.2, 0.2, 0.3],
    [0.7, 0.4, 0.5],
    [0.3, 0.8, 0.3],
    [0.0, 0.5, 0.3],
    [0.1, 0.5, 0.5],
 ]

Our goal now is to visualize how much the model mixes up the categories. That is, we'd like to see the Confusion Matrix comparing all categories against each other. We can do that easily by simply logging it with the experiment:

In [5]:
experiment.log_confusion_matrix(desired_output, actual_output);

That's it! We can now end the experiment and take a look at the resulting matrix:

In [6]:
experiment.end()
COMET INFO: ----------------------------
COMET INFO: Comet.ml Experiment Summary:
COMET INFO:   Data:
COMET INFO:     url: https://www.comet.ml/cometpublic/confusion-matrix/0d1a88a5038b4507a31061039c269f36
COMET INFO:   Uploads:
COMET INFO:     confusion-matrix: 1
COMET INFO: ----------------------------
COMET INFO: Uploading stats to Comet before program termination (may take several seconds)
In [7]:
experiment.display(tab="confusion-matrices")

For more details on this tab, please see the details on the Confusion Matrix user interface.

Example #2: Log Confusion Matrices During Learning

This example will create a series of confusion matrices showing how the model gets less confused as training proceeds.

We will train the standard MNIST digit classification task.

We import the items that we will need:

In [8]:
from tensorflow.keras.callbacks import Callback
from tensorflow.keras.layers import Dense
from tensorflow.keras.models import Sequential
from tensorflow.keras.optimizers import RMSprop
from tensorflow.keras.utils import to_categorical

from keras.datasets import mnist
Using TensorFlow backend.

We load the training set:

In [9]:
num_classes = 10

# the data, shuffled and split between train and test sets
(x_train, y_train), (x_test, y_test) = mnist.load_data()

x_train = x_train.reshape(60000, 784)
x_test = x_test.reshape(10000, 784)
x_train = x_train.astype("float32")
x_test = x_test.astype("float32")
x_train /= 255
x_test /= 255

# convert class vectors to binary class matrices
y_train = to_categorical(y_train, num_classes)
y_test = to_categorical(y_test, num_classes)

Define a function to create the model:

In [10]:
def create_model():
    model = Sequential()
    model.add(Dense(128, activation="sigmoid", input_shape=(784,)))
    model.add(Dense(128, activation="sigmoid"))
    model.add(Dense(128, activation="sigmoid"))
    model.add(Dense(10, activation="softmax"))
    model.compile(
        loss="categorical_crossentropy", optimizer=RMSprop(), metrics=["accuracy"]
    )
    return model

Next, we define a Keras callback to log the confusion matrix:

In [11]:
class ConfusionMatrixCallback(Callback):
    def __init__(self, experiment, inputs, targets):
        self.experiment = experiment
        self.inputs = inputs
        self.targets = targets

    def on_epoch_end(self, epoch, logs={}):
        predicted = self.model.predict(self.inputs)
        self.experiment.log_confusion_matrix(
            self.targets,
            predicted,
            title="Confusion Matrix, Epoch #%d" % (epoch + 1),
            file_name="confusion-matrix-%03d.json" % (epoch + 1),
        )

And create another Comet experiment:

In [12]:
experiment = Experiment(project_name="confusion-matrix", log_env_details=False, log_code=False)
COMET INFO: Experiment is live on comet.ml https://www.comet.ml/cometpublic/confusion-matrix/743b63b55d23429dad41a1274bd7e6fb

Before any training, we want to log the confusion so that we can see what it looks like before any adjusting of weights in the network:

In [13]:
model = create_model()

y_predicted = model.predict(x_test)

We also supply the step (zero, before training), a title, and file_name:

In [14]:
experiment.log_confusion_matrix(
    y_test,
    y_predicted,
    step=0,
    title="Confusion Matrix, Epoch #0",
    file_name="confusion-matrix-%03d.json" % 0,
);

We now create the callback and train the data for 5 epochs:

In [15]:
callback = ConfusionMatrixCallback(experiment, x_test, y_test)

model.fit(
    x_train,
    y_train,
    batch_size=120,
    epochs=5,
    callbacks=[callback],
    validation_data=(x_test, y_test),
)
COMET INFO: Ignoring automatic log_parameter('verbose') because 'keras:verbose' is in COMET_LOGGING_PARAMETERS_IGNORE
COMET INFO: Ignoring automatic log_parameter('do_validation') because 'keras:do_validation' is in COMET_LOGGING_PARAMETERS_IGNORE
Train on 60000 samples, validate on 10000 samples
Epoch 1/5
COMET INFO: Ignoring automatic log_metric('batch_batch') because 'keras:batch_batch' is in COMET_LOGGING_METRICS_IGNORE
COMET INFO: Ignoring automatic log_metric('batch_size') because 'keras:batch_size' is in COMET_LOGGING_METRICS_IGNORE
60000/60000 [==============================] - 7s 118us/sample - loss: 0.7812 - accuracy: 0.7751 - val_loss: 0.3121 - val_accuracy: 0.9102
Epoch 2/5
60000/60000 [==============================] - 7s 114us/sample - loss: 0.2613 - accuracy: 0.9240 - val_loss: 0.2074 - val_accuracy: 0.9373
Epoch 3/5
60000/60000 [==============================] - 7s 116us/sample - loss: 0.1877 - accuracy: 0.9447 - val_loss: 0.1652 - val_accuracy: 0.9492
Epoch 4/5
60000/60000 [==============================] - 7s 113us/sample - loss: 0.1473 - accuracy: 0.9559 - val_loss: 0.1406 - val_accuracy: 0.9579
Epoch 5/5
60000/60000 [==============================] - 7s 113us/sample - loss: 0.1218 - accuracy: 0.9629 - val_loss: 0.1229 - val_accuracy: 0.9632
Out[15]:
<tensorflow.python.keras.callbacks.History at 0x7fa8c3f2ab38>
In [16]:
experiment.end()
COMET INFO: ----------------------------
COMET INFO: Comet.ml Experiment Summary:
COMET INFO:   Data:
COMET INFO:     url: https://www.comet.ml/cometpublic/confusion-matrix/743b63b55d23429dad41a1274bd7e6fb
COMET INFO:   Metrics [count] (min, max):
COMET INFO:     accuracy [5]                : (0.7750999927520752, 0.962933361530304)
COMET INFO:     batch_accuracy [250]        : (0.14166666567325592, 0.9643939137458801)
COMET INFO:     batch_loss [250]            : (0.0368625782430172, 2.6164801120758057)
COMET INFO:     epoch_duration [5]          : (6.749969814998622, 7.092286439998134)
COMET INFO:     loss [5]                    : (0.12177954505011439, 0.7811669090390205)
COMET INFO:     step                        : 2920
COMET INFO:     val_accuracy [5]            : (0.9101999998092651, 0.9631999731063843)
COMET INFO:     val_loss [5]                : (0.12290808337088674, 0.31211401319503784)
COMET INFO:     validate_batch_accuracy [45]: (0.8861111402511597, 0.9916666746139526)
COMET INFO:     validate_batch_loss [45]    : (0.03381219133734703, 0.6426334381103516)
COMET INFO:   Other [count]:
COMET INFO:     trainable_params: 134794
COMET INFO:   Uploads:
COMET INFO:     confusion-matrix: 6
COMET INFO: ----------------------------
COMET INFO: Uploading stats to Comet before program termination (may take several seconds)
COMET INFO: Waiting for completion of the file uploads (may take several seconds)
COMET INFO: Still uploading

Now we take a look at the matrices created over the training. You can switch between confusion matrices by selecting the name in the upper, left-hand corner.

In [17]:
experiment.display(tab="confusion-matrices")

Example 3: Reuse ConfusionMatrix instance

Now, we ant to create example images for each of the cells in the matrix. In addition, we want to re-use the images if we can.

For this, we will create a ConfusionMatrix instance and re-use it.

In [18]:
from comet_ml.utils import ConfusionMatrix

To create an example for each item, we write an index_to_sample function that takes an index position (offset into the training data), create and log an image, and then return the assetId as a key in a dict:

In [19]:
def index_to_example(index):
    image_array = x_test[index]
    image_name = "confusion-matrix-%05d.png" % index
    results = experiment.log_image(
        image_array, name=image_name, image_shape=(28, 28, 1)
    )
    # Return sample, assetId (index is added automatically)
    return {"sample": image_name, "assetId": results["imageId"]}

We create a callback, like, before; however, this time we will keep track of an instance of the ConfusionMatrix:

In [20]:
class ConfusionMatrixCallback(Callback):
    def __init__(self, experiment, inputs, targets, confusion_matrix):
        self.experiment = experiment
        self.inputs = inputs
        self.targets = targets
        self.confusion_matrix = confusion_matrix

    def on_epoch_end(self, epoch, logs={}):
        predicted = self.model.predict(self.inputs)
        self.confusion_matrix.compute_matrix(self.targets, predicted)
        self.experiment.log_confusion_matrix(
            matrix=self.confusion_matrix,
            title="Confusion Matrix, Epoch #%d" % (epoch + 1),
            file_name="confusion-matrix-%03d.json" % (epoch + 1),
        )

We create another Comet experiment:

In [21]:
experiment = Experiment(project_name="confusion-matrix", log_env_details=False, log_code=False)
COMET INFO: Experiment is live on comet.ml https://www.comet.ml/cometpublic/confusion-matrix/9206a808997f44d6b90a910f8dc25bc9

And another model:

In [22]:
model = create_model()

Again, before training, we log the confusion matrix:

In [23]:
# Before any training:
y_predicted = model.predict(x_test)

First, we make an instance, passing in the index_to_example function:

In [24]:
confusion_matrix = ConfusionMatrix(index_to_example_function=index_to_example)

Now, we use the comet_matrix method of the ConfusionMatrix class:

In [25]:
confusion_matrix.compute_matrix(y_test, y_predicted)

We can use the ConfusionMatrix instance to see a rough ASCII version:

In [26]:
confusion_matrix.display()
   A                Confusion Matrix            
   c               Predicted Category           
   t       0   1   2   3   4   5   6   7   8   9
   u   0   0   0   0   0   0   0   0   0 980   0
   a   1   0   0   0   0   0   0   0   0 113   0
   l   2   0   0   0   0   0   0   0   0 103   0
       3   0   0   0   0   0   0   0   0 101   0
   C   4   0   0   0   0   0   0   0   0 982   0
   a   5   0   0   0   0   0   0   0   0 892   0
   t   6   0   0   0   0   0   0   0   0 958   0
   e   7   0   0   0   0   0   0   0   0 102   0
   g   8   0   0   0   0   0   0   0   0 974   0
   o   9   0   0   0   0   0   0   0   0 100   0
   r

This time, instead of logging the actual and predicted vectors, we instead pass in the entire ConfusionMatrix as the matrix:

In [27]:
experiment.log_confusion_matrix(
    matrix=confusion_matrix,
    step=0,
    title="Confusion Matrix, Epoch #0",
    file_name="confusion-matrix-%03d.json" % 0,
);

Again, we create callbacks, and train the network (this will take just a little more time, as it is generating the assets on the fly):

In [28]:
callback = ConfusionMatrixCallback(experiment, x_test, y_test, confusion_matrix)

model.fit(
    x_train,
    y_train,
    batch_size=120,
    epochs=5,
    callbacks=[callback],
    validation_data=(x_test, y_test),
)
Train on 60000 samples, validate on 10000 samples
Epoch 1/5
60000/60000 [==============================] - 39s 653us/sample - loss: 0.7895 - accuracy: 0.7731 - val_loss: 0.3226 - val_accuracy: 0.9046
Epoch 2/5
60000/60000 [==============================] - 12s 198us/sample - loss: 0.2653 - accuracy: 0.9219 - val_loss: 0.2205 - val_accuracy: 0.9327
Epoch 3/5
60000/60000 [==============================] - 11s 178us/sample - loss: 0.1900 - accuracy: 0.9447 - val_loss: 0.1631 - val_accuracy: 0.9525
Epoch 4/5
60000/60000 [==============================] - 7s 109us/sample - loss: 0.1472 - accuracy: 0.9565 - val_loss: 0.1450 - val_accuracy: 0.9567
Epoch 5/5
60000/60000 [==============================] - 8s 127us/sample - loss: 0.1202 - accuracy: 0.9640 - val_loss: 0.1264 - val_accuracy: 0.9618
Out[28]:
<tensorflow.python.keras.callbacks.History at 0x7fa8d0612160>

We end the experiment (here you can see how many assets were uploaded):

In [29]:
experiment.end()
COMET INFO: ----------------------------
COMET INFO: Comet.ml Experiment Summary:
COMET INFO:   Data:
COMET INFO:     url: https://www.comet.ml/cometpublic/confusion-matrix/9206a808997f44d6b90a910f8dc25bc9
COMET INFO:   Metrics [count] (min, max):
COMET INFO:     accuracy [5]                : (0.7730500102043152, 0.9639999866485596)
COMET INFO:     batch_accuracy [250]        : (0.09166666865348816, 0.9642958045005798)
COMET INFO:     batch_loss [250]            : (0.03238104283809662, 2.7076375484466553)
COMET INFO:     epoch_duration [5]          : (6.485461345997464, 39.099558262001665)
COMET INFO:     loss [5]                    : (0.12022181416675448, 0.7895453806519508)
COMET INFO:     step                        : 2920
COMET INFO:     val_accuracy [5]            : (0.9046000242233276, 0.9617999792098999)
COMET INFO:     val_loss [5]                : (0.12641188504733145, 0.3226115715056658)
COMET INFO:     validate_batch_accuracy [45]: (0.8808943033218384, 1.0)
COMET INFO:     validate_batch_loss [45]    : (0.028914928436279297, 0.6366378664970398)
COMET INFO:   Other [count]:
COMET INFO:     trainable_params: 134794
COMET INFO:   Uploads:
COMET INFO:     confusion-matrix: 6
COMET INFO:     images          : 1184
COMET INFO: ----------------------------
COMET INFO: Uploading stats to Comet before program termination (may take several seconds)
COMET INFO: Waiting for completion of the file uploads (may take several seconds)
COMET INFO: Still uploading

And see the full confusion matrix, complete with sample images in each cell (click on a cell to see the examples):

In [30]:
experiment.display(tab="confusion-matrices")

In the index_to_example function you can return:

  • an integer, representing the index
  • a string, representing text to show in the Example View
  • an URL, representing a link to show in the Example View
  • a {"sample": NAME, "assetId": ASSET-ID} dictionary, representing an image asset

The ConfusionMatrix object allows many options, including:

  • automatically finding the "most confused" categories, if more than 25
  • limit the categories shown (use ConfusionMatrix(selected=[...]))
  • change the row and column labels
  • change the category labels
  • change the title
  • display text, URLs, or images in Example View

We hope that this gives you some ideas of how you can use the Comet Confusion Matrix! If you have questions or comments, feel free to visit the Comet issue tracker and leave us a note.