Logging Histograms, Gradients and Activations with Comet

Introduction

3D Histograms or Ridge Plots are a great way to visualize the training progress of your Neural Network. Histogram distributions of the weights, gradients, and activations allow us to get some intuition for the loss surface that we are trying to optimize. 

For example, your model may be approaching a local minima if you observe that your gradients are becoming smaller over time. On the flip side, a trend of large gradients with high variance might imply that you should reduce your learning rate, or use some form of regularization.

The Comet SDK provides an easy way to visualize your weights, activations and gradients using the `log_histogram_3D` method. For this post we will provide examples of histogram logging with Comet using Tensorflow’s Gradient Tape.

You can explore the visualizations in this post here. We have also included Colab Notebooks at the end of this post so that you can try out the histograms feature for yourself!

Logging Histograms

For this example, we are going to use a simple 2 layer perceptron and train it on the MNIST dataset. Let’s start by loading in our data, and defining our model

def get_dataset():
    num_classes = 10

    # the data, shuffled and split between train and test sets
    (x_train, y_train), (x_test, y_test) = mnist.load_data()

    x_train = x_train.reshape(60000, 784)
    x_test = x_test.reshape(10000, 784)
    x_train = x_train.astype("float32")
    x_test = x_test.astype("float32")
    x_train /= 255
    x_test /= 255
    print(x_train.shape[0], "train samples")
    print(x_test.shape[0], "test samples")

    # convert class vectors to binary class matrices
    y_train = to_categorical(y_train, num_classes)
    y_test = to_categorical(y_test, num_classes)

    return x_train, y_train, x_test, y_test

def build_model_graph():
    model = Sequential()
    model.add(Dense(128, activation="sigmoid", input_shape=(784,), name="dense1"))
    model.add(Dense(64, activation="sigmoid", name="dense2"))
    model.add(Dense(10, activation="softmax", name="output"))

    return model

Tensorflow’s GradientTape allows eager execution of model code without precomputing a static graph in which inputs are fed in through placeholders.

This implies that at every training step, we will have to calculate the gradients of our parameters with respect to our loss, and apply them with the optimizer in order to perform backpropagation.  

def step(model, X, y, gradmap={}, activations={}):
    with tf.GradientTape() as tape:
        pred = model(X)
        loss = categorical_crossentropy(y, pred)

    grads = tape.gradient(loss, model.trainable_variables)
    opt.apply_gradients(zip(grads, model.trainable_variables))

    gradmap = get_gradients(gradmap, grads, model)
    activations = get_activations(activations, X, model)

    return loss.numpy().mean(), gradmap, activations

We will use two dictionaries to store layerwise information about the gradients and activations. For each batch of data we will accumulate the values of the gradients and activations. At the end of every epoch we will scale these by the number of batches in an epoch, and log them to Comet.

def train(model, X, y, epoch, steps_per_epoch, experiment):
    gradmap = {}
    activations = {}
    total_loss = 0
    with experiment.train():
        # show the current epoch number
        print("[INFO] starting epoch {}/{}...".format(epoch, EPOCHS), end="")
        for i in range(0, steps_per_epoch):
            start = i * BS
            end = start + BS
            curr_step = ((i + 1) * BS) * epoch
            
            loss, gradmap, activations = step(
                model,
                X[start:end],
                y[start:end],
                gradmap,
                activations,
            )
            experiment.log_metric(
              "batch_loss", loss, step=curr_step
            )            
            
            total_loss += loss
        
        experiment.log_metric(
            "loss", total_loss / steps_per_epoch, step=epoch * steps_per_epoch
        )

    # scale gradients
    for k, v in gradmap.items():
        gradmap[k] = v / steps_per_epoch

    # scale activations
    for k, v in activations.items():
        activations[k] = v / steps_per_epoch

    log_weights(experiment, model, epoch * steps_per_epoch)
    log_histogram(experiment, gradmap, epoch * steps_per_epoch, prefix="gradient")
    log_histogram(experiment, activations, epoch * steps_per_epoch, prefix="activation")

We can visualize these reported values under the Histograms tab in our Comet Experiment

Figure 1. Logged Histograms in Comet (link to Comet Experiment)

We can sort and search the various histograms using the options in the menu. For example, we might want to group our histograms based on the reported names


Figure 2. Selecting grouping for Histograms (link to Comet Experiment)

Figure 3. Histograms Grouped by Name (link to Comet Experiment)

Figure 4. Histograms Grouped by Name Expanded View

For larger models, we may have many such histograms. Comet makes it possible to search for the specific histogram that you’re interested in. In Figure 5 we can see how we are able to search for the histogram distribution of the gradients of a single layer. 


Figure 5. Filtered Histograms based on Name

Additionally, Comet provides Custom Panels that can visualize these histograms as a heatmap over time. This particular custom panel can be found in Panels Gallery → Public Panels → Histogram by Step. After adding the Panel to the project, we should see it above the Experiment Table. 

Figure 6. Adding a custom panel to visualize histograms

Figure 7. Custom Panel to Visualize Histograms in the Project 

This custom panel has dropdowns for both the experiment ids and the histogram names. We can select a particular experiment from the dropdown and then visualize the individual histograms in that experiment as a heat map over time.


Figure 8. Selecting an experiment from the dropdown in a Custom Panel

Figure 9. Selecting a histogram in the experiment from the dropdown in a Custom Panel

Conclusion

In this post we demonstrated how Comet’s histogram logging and Custom Panels can be used to visualize the weights, gradients, and activations of a Neural Network.

Colab Notebooks :

TF Gradient Tape Example

Pytorch Example


It’s easy to get started

And it's free. Two things everyone loves.

CREATE A FREE ACCOUNT CONTACT SALES CONTACT SALES CREATE A FREE ACCOUNT