skip to Main Content
Join Us for Comet's Annual Convergence Conference on May 8-9:

Comet + Snowflake: A Powerful Combination for Better Reproducibility and Visibility Into Your ML Workflow

Companies around the world use Snowflake to securely store, manage and process their data at scale. Teams and organizations who want to use their data in Snowflake to train machine learning (ML) models, use Snowpark and its integrated repository of Python ML libraries and frameworks. 

Comet is an MLOps platform that allows ML teams to reproduce, debug, manage, and monitor their models with our Experiment Management, Artifacts, Model Registry, and Model Production Monitoring products.

The Need for Dataset Lineage and Versioning in ML

“Garbage in, garbage out” is a common saying in the machine learning world. If you give your machine learning model bad data to train on, chances are it won’t perform to your expectations. During the debugging phase of the ML Lifecycle, it’s important for practitioners to not only view their model’s metrics, but also log which exact dataset version was used for training. Dataset Version 1.0.2  has more recent data which might help the model generalize better. But how would you know that’s the case if you have no way of seeing the lineage from a training run to a specific dataset version?

 

Upload Snowpark DataFrames as Comet Artifacts

Comet’s Integration with Snowflake makes it seamless to upload a Snowpark DataFrame as a Comet Artifact. 

Then within Comet’s Artifact UI, users can find the

  1. Dataset version 
  2. SQL Query used to create the DataFrame 
  3. Sample data from the DataFrame
  4. Link back to the Snowflake UI where the data is stored 
  5. Lineage to see which experiments are using this dataset

Connect Snowpark Artifacts to an Experiment

Comet tracks all the relevant information needed to reproduce and debug model training runs. With Comet’s SDK, developers can log metrics, hyper-parameters, code, assets, and artifacts for an experiment and visualize it within Comet.

By linking an artifact to an experiment, it’s now possible to fully debug your model. In the graphic below, see how in just a couple of clicks in the Comet UI, practitioners can see the output metrics, code, mode graph and dataset version for a training run using Snowflake!

snowflake single experiment

Try for free 

Comet is an extremely easy tool to integrate with your current machine learning workflows. Sign-up for a free account today and see how easy it makes debugging and reproducing machine learning models!

Siddharth Mehta

ML Growth Engineer @ Comet. Interested in Computer Vision, Robotics, and Reinforcement Learning
Back To Top