Using Comet Panels to Visualize Model Analysis

Machine Learning models tend to perform inconsistently across different parts of a dataset. Summary performance metrics such as AUC, and F1, are not enough to identify the parts of the data where a model needs improvement. Model Analysis tools such as Uber’s Manifold or Tensorflow’s Model Analysis allow us to visualize model performance at a more granular level.

In this post we will see how we can use the Tensorflow Model Analysis library to visualize how our evaluation metrics differ across different feature slices of the dataset. We will then log these visualizations to Comet and use Panels to make it easy to compare the analysis of different models.

What is Tensorflow Model Analysis (TFMA)?

Tensorflow Model Analysis is a library for performing model evaluation across different slices of data. In this report, we’re going to run the example provided in the TFMA documentation.

You can try the example for yourself in Colab, or check out the results from these experiments here.

Note: You will have to update pip in the Colab instance, install the TFMA package, and restart the Colab instance before being able to run the notebook. You should only have to do this once, and will not need to reinstall TFMA after restarting the Colab instance.


There is a little bit of setup involved with using TFMA. We’re first going to download a TAR file that contains the following:

  1. Training and Evaluation Datasets
  2. A Schema generated from Tensorflow Data Validation
  3. A saved model keras model
import io, os, tempfile
TAR_NAME = 'saved_models-2.2'
BASE_DIR = tempfile.mkdtemp()
DATA_DIR = os.path.join(BASE_DIR, TAR_NAME, 'data')
MODELS_DIR = os.path.join(BASE_DIR, TAR_NAME, 'models')
SCHEMA = os.path.join(BASE_DIR, TAR_NAME, 'schema.pbtxt')
OUTPUT_DIR = os.path.join(BASE_DIR, 'output')

!curl -O{TAR_NAME}.tar
!tar xf {TAR_NAME}.tar
!rm {TAR_NAME}.tar

print("Here's what we downloaded:")
!ls -R {BASE_DIR}

The next thing we’re going to do is convert our data, which is in CSV format, to the TFRecords format using the Data Schema. The Schema is a proto that describes the properties of the dataset. Some of those properties include:

  • Which features are expected to be present
  • The datatype of the features
  • The number of values for a feature in each example
  • The presence of each feature across all examples
  • The expected domains of features
import tensorflow as tf
from google.protobuf import text_format
from import file_io
from tensorflow_metadata.proto.v0 import schema_pb2
from tensorflow.core.example import example_pb2

schema = schema_pb2.Schema()
contents = file_io.read_file_to_string(SCHEMA)
schema = text_format.Parse(contents, schema)

import csv

datafile = os.path.join(DATA_DIR, 'eval', 'data.csv')
reader = csv.DictReader(open(datafile, 'r'))
examples = []
for line in reader:
  example = example_pb2.Example()
  for feature in schema.feature:
    key =
    if feature.type == schema_pb2.FLOAT:
      example.features.feature[key].float_list.value[:] = (
          [float(line[key])] if len(line[key]) > 0 else [])
    elif feature.type == schema_pb2.INT:
      example.features.feature[key].int64_list.value[:] = (
          [int(line[key])] if len(line[key]) > 0 else [])
    elif feature.type == schema_pb2.BYTES:
      example.features.feature[key].bytes_list.value[:] = (
          [line[key].encode('utf8')] if len(line[key]) > 0 else [])
  big_tipper = float(line['tips']) > float(line['fare']) * 0.2
  example.features.feature['big_tipper'].float_list.value[:] = [big_tipper]

tfrecord_file = os.path.join(BASE_DIR, '')
with as writer:
  for example in examples:

Now that our data is in the correct format, lets define the Evaluation Configuration for our Model Analysis. The evaluation config can be parsed from a protobuf text using Google’s protobuf library. This config defines the following specs:

  • Model Spec: The Model Spec is used to define model specific parameters. In order to use TFMA, we must provide the following:
    • name – name of model (if multiple models used)
    • signature_name – name of signature used for predictions (default is serving_default). Use eval if using an EvalSavedModel.
    • label_key – name of the feature associated with the label.
  • Metric Specs: Define your metrics of interest here. For example, Accuracy, Precision, ROC, etc.
  • Slicing Specs: Define the features that you on which you will slice your dataset. Slicing can be done either by feature_keys, or feature_values.
import tensorflow_model_analysis as tfma

# Setup tfma.EvalConfig settings
keras_eval_config = text_format.Parse("""
  ## Model information
  model_specs {
    # For keras (and serving models) we need to add a `label_key`.
    label_key: "big_tipper"

  ## Post training metric information. These will be merged with any built-in
  ## metrics from training.
  metrics_specs {
    metrics { class_name: "ExampleCount" }
    metrics { class_name: "BinaryAccuracy" }
    metrics { class_name: "BinaryCrossentropy" }
    metrics { class_name: "AUC" }
    metrics { class_name: "AUCPrecisionRecall" }
    metrics { class_name: "Precision" }
    metrics { class_name: "Recall" }
    metrics { class_name: "MeanLabel" }
    metrics { class_name: "MeanPrediction" }
    metrics { class_name: "Calibration" }
    metrics { class_name: "CalibrationPlot" }
    metrics { class_name: "ConfusionMatrixPlot" }
    # ... add additional metrics and plots ...

  ## Slicing information
  slicing_specs {}  # overall slice
  slicing_specs {
    feature_keys: ["trip_start_hour"]
  slicing_specs {
    feature_keys: ["trip_start_day"]
  slicing_specs {
    feature_values: {
      key: "trip_start_month"
      value: "1"
  slicing_specs {
    feature_keys: ["trip_start_hour", "trip_start_day"]
""", tfma.EvalConfig())

Autologging to Comet

Once we have setup our model and data, we can run TFMA:

# Create a tfma.EvalSharedModel that points at our keras model.
keras_model_path = os.path.join(MODELS_DIR, 'keras', MODEL_VERSION)
keras_eval_shared_model = tfma.default_eval_shared_model(

keras_output_path = os.path.join(OUTPUT_DIR, 'keras')

# Run TFMA
results = tfma.run_model_analysis(

We can now log the results to Comet with the following snippet and visualize them using the TFMA Viewer Custom Panel.

import os
os.environ["COMET_AUTO_LOG_TFMA"] = "1"
os.environ["COMET_API_KEY"] = "YOUR_API_KEY"
os.environ["COMET_PROJECT_NAME"] = "tf_mode_analysis"

experiment = comet_ml.Experiment()

tfma.view.render_slicing_metrics(results, slicing_column='trip_start_day')



TFMA provides various types of plots to debug our model. You can find a full list here. Since we’re analyzing a classification model, some of the plots we will use are the Residuals Plots, Calibration Plots, Precision-Recall Curve and ROC Curve.

Calibration Plots and Residuals Plots allow us to compare the model performance against the target value. With TFMA, the Residual plot also tells you how many examples were found in a particular residual bin.

The ROC curves help with determining cutoff thresholds for our classifiers, while Precision-Recall curves are useful for profiling classifiers on imbalanced datasets.

We also have plots for the metrics defined in our schema based on slices of an input feature. In this case, we’re slicing the data on the trip_start_day feature. Selecting the tfma_slice_metrics_2.html option under the TFMA asset in the Panel displays the plot. These plots show model performance on slices of data binned by the feature value.

TFMA panels are available at the Project level and Experiment level. This allows you to switch between the analysis of multiple models or a single model. TFMA Panels are also available when you run a diff of two experiments.

TFMA Panel in the project level view (link to experiment)
Selecting a visualization in the Project Level View (link to experiment)
TFMA Panel in the Experiment View (link to experiment)
TFMA Panel in the Diff View (link to experiment)

You can explore these plots in the embedded panel below. Toggle between the different experiments and types of plots by clicking on the experiment name or TFMA asset name.

To find out more about logging Tensorflow Model Analysis check out our docs.

It’s easy to get started

And it's free. Two things everyone loves.