New Integration: Comet + Spark NLP

We’re excited to announce another excellent integration with Comet—Spark NLP! This integration allows data scientists and teams to leverage Comet’s experiment tracking and visualization tools with Spark NLP’s powerful library for building production-grade, state-of-the-art NLP models.

About Spark NLP

Spark NLP is an open-source text processing library (available in Python, Java, and Scala) from John Snow Labs that provides access to production-grade, scalable, and trainable versions of the latest research in natural language processing.

Spark NLP offers an unmatched combination of speed, scalability, and accuracy that makes it the most widely-used NLP library in the enterprise. It includes out-of-the-box functionality for 8 different NLP tasks, more than 4,000 pre-trained models and pipelines, and support for more than 200 languages.

About the Integration

Spark NLP now ships with a dedicated CometLogger. You can now log metrics, hyperparameters, source code, visualizations, and much more from your Spark NLP runs to the Comet UI.

Once you’ve set up your account and configured your Comet API Key within your project, simply import the CometLogger from Spark NLP! You’ll then also be able to take advantage of Comet’s rich visualization capabilities.

The code snippet below shows how you can easily log Spark NLP training runs to the Comet UI.

import sparknlp
from sparknlp.base import *
from sparknlp.annotator import *
from sparknlp.logging.comet import CometLogger

spark = sparknlp.start()

OUTPUT_LOG_PATH = "./run"
logger = CometLogger()

document = DocumentAssembler().setInputCol("text").setOutputCol("document")
embds = (
    UniversalSentenceEncoder.pretrained()
    .setInputCols("document")
    .setOutputCol("sentence_embeddings")
)
multiClassifier = (
    MultiClassifierDLApproach()
    .setInputCols("sentence_embeddings")
    .setOutputCol("category")
    .setLabelColumn("labels")
    .setBatchSize(128)
    .setLr(1e-3)
    .setThreshold(0.5)
    .setShufflePerEpoch(False)
    .setEnableOutputLogs(True)
    .setOutputLogsPath(OUTPUT_LOG_PATH)
    .setMaxEpochs(1)
)

logger.monitor(logdir=OUTPUT_LOG_PATH, model=multiClassifier)
trainDataset = spark.createDataFrame(
    [("Nice.", ["positive"]), ("That's bad.", ["negative"])],
    schema=["text", "labels"],
)

pipeline = Pipeline(stages=[document, embds, multiClassifier])
pipeline.fit(trainDataset)
logger.end()

Getting Started

Getting started with this integration is really easy—

The following resources should help you start logging Spark NLP runs to Comet in no time:

  • Colab Notebook: Our Notebook is ready to run, but you can also create a copy if you’d like to modify it.
  • Spark NLP GitHub Repo: Need a crash course on Spark NLP? Check out this GitHub repo for the basics, a workshop full of runnable examples, and much more.
  • A free Comet account: Building with Comet is absolutely free—unlimited public and private projects, 100GB of storage, hyperparameter search, and more.

Want to stay in the loop? Subscribe to the Comet Newsletter for insights and perspective on the latest ML news, projects, and more.


It’s easy to get started

And it's free. Two things everyone loves.

CREATE A FREE ACCOUNT CONTACT SALES CONTACT SALES CREATE A FREE ACCOUNT