Skip to content

TensorFlow integration guide#

Tip

See also: Keras integration guide

Open in Colab

In this guide, we'll use Neptune to log metadata while training models with TensorFlow. We'll cover the following:

  • Tracking and versioning some data.
  • Logging losses and other metrics generated from training.
  • Logging predictions over multiple epochs.
  • Saving the generated model to the model registry.

See example in Neptune  Example script 

Before you start#

To follow this example, have the following installed:

pip install -U neptune tensorflow numpy requests
conda install -c conda-forge neptune tensorflow numpy requests
Upgrading with neptune-client already installed

Important: To smoothly upgrade to the 1.0 version of the Neptune client library, first uninstall the neptune-client library and then install neptune.

pip uninstall neptune-client
pip install neptune

Logging example#

In this example, we'll work with the MNIST dataset. We'll prepare the data, set up a training loop, and log the metadata with Neptune.

Create a script#

  1. Import the needed libraries:

    import io
    
    import requests
    import tensorflow as tf
    import numpy as np
    
    import neptune
    
  2. Start a Neptune run:

    run = neptune.init_run()
    
    If Neptune can't find your project name or API token

    As a best practice, you should save your Neptune API token and project name as environment variables:

    export NEPTUNE_API_TOKEN="h0dHBzOi8aHR0cHM6Lkc78ghs74kl0jvYh3Kb8"
    export NEPTUNE_PROJECT="ml-team/classification"
    

    You can, however, also pass them as arguments when initializing Neptune:

    run = neptune.init_run(
        api_token="h0dHBzOi8aHR0cHM6Lkc78ghs74kl0jvYh3Kb8",  # your token here
        project="ml-team/classification",  # your full project name here
    )
    
    • API token: In the bottom-left corner, expand the user menu and select Get my API token.
    • Project name: in the top-right menu: Edit project details.

    If you haven't registered, you can also log anonymously to a public project (make sure not to publish sensitive data through your code!):

    run = neptune.init_run(
        api_token=neptune.ANONYMOUS_API_TOKEN,
        project="common/quickstarts",
    )
    
  3. Download the MNIST dataset and track its metadata:

    response = requests.get(
        "https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz"
    )
    with open("mnist.npz", "wb") as f:
        f.write(response.content)
    
    run["datasets/version"].track_files("mnist.npz")
    

    You can use the track_files() method for a file or folder when you want to track the metadata rather than upload the files in full.

    Learn more

    See how to work with files that are tracked as artifacts: Track artifacts

  4. Set up train and test sets:

    with np.load("mnist.npz") as data:
        train_examples = data["x_train"]
        train_labels = data["y_train"]
        test_examples = data["x_test"]
        test_labels = data["y_test"]
    
  5. Define and log model parameters:

    params = {
        "batch_size": 1024,
        "shuffle_buffer_size": 100,
        "lr": 0.001,
        "num_epochs": 10,
        "num_visualization_examples": 10,
    }
    
    run["training/model/params"] = params
    

    You can use simple assignment (=) to log single values or dictionaries to a field in the run. You can define the structure freely. In this case we're creating the nested namespaces "training/model" and, inside those, the "params" field where the dictionary is logged.

    Learn more

    Learn about the structure of Neptune objects: Namespaces and fields

  6. Normalize and prepare the data for training:

    def normalize_img(image):
        """Normalizes images: `uint8` -> `float32`."""
        return tf.cast(image, tf.float32) / 255.0
    
    
    train_examples = normalize_img(train_examples)
    test_examples = normalize_img(test_examples)
    
    train_dataset = tf.data.Dataset.from_tensor_slices((train_examples, train_labels))
    test_dataset = tf.data.Dataset.from_tensor_slices((test_examples, test_labels))
    
    train_dataset = train_dataset.shuffle(params["shuffle_buffer_size"]).batch(
        params["batch_size"]
    )
    test_dataset = test_dataset.batch(params["batch_size"])
    
  7. Prepare the model:

    model = tf.keras.models.Sequential(
        [
            tf.keras.layers.Flatten(input_shape=(28, 28)),
            tf.keras.layers.Dense(128, activation="relu"),
            tf.keras.layers.Dense(10),
        ]
    )
    
    loss_object = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
    optimizer = tf.keras.optimizers.Adam(params["lr"])
    
    with io.StringIO() as s:
        model.summary(print_fn=lambda x: s.write(x + "\n"))
        model_summary = s.getvalue()
    
  8. Log the model summary:

    run["training/model/summary"] = model_summary
    
  9. Set up a training loop with Neptune logging (highlighted):

    def loss_and_preds(model, x, y, training):
        # training=training is needed only if there are layers with different
        # behavior during training versus inference (e.g. Dropout)
        y_ = model(x, training=training)
    
        return loss_object(y_true=y, y_pred=y_), y_
    
    
    def grad(model, inputs, targets):
        with tf.GradientTape() as tape:
            loss_value, _ = loss_and_preds(model, inputs, targets, training=True)
        return loss_value, tape.gradient(loss_value, model.trainable_variables)
    
    
    for epoch in range(params["num_epochs"]):
        epoch_loss_avg = tf.keras.metrics.Mean()
        epoch_accuracy = tf.keras.metrics.SparseCategoricalAccuracy()
    
        for x, y in train_dataset:
            loss_value, grads = grad(model, x, y)
            optimizer.apply_gradients(zip(grads, model.trainable_variables))
    
            epoch_loss_avg.update_state(loss_value)
            epoch_accuracy.update_state(y, model(x, training=True))
    
        # Log metrics for the epoch
        # Train metrics
        run["training/train/loss"].append(epoch_loss_avg.result())
        run["training/train/accuracy"].append(epoch_accuracy.result())
    
        # Log test metrics
        test_loss, test_preds = loss_and_preds(model, test_examples, test_labels, False)
        run["training/test/loss"].append(test_loss)
        acc = epoch_accuracy(test_labels, test_preds)
        run["training/test/accuracy"].append(acc)
    
        # Log test prediction
        for idx in range(params["num_visualization_examples"]):
            np_image = test_examples[idx].numpy().reshape(28, 28)
            image = neptune.types.File.as_image(np_image)
            pred_label = test_preds[idx].numpy().argmax()
            true_label = test_labels[idx]
            run[f"training/visualization/epoch_{epoch}"].append(
                image, description=f"pred={pred_label} | actual={true_label}"
            )
    
        if epoch % 5 == 0 or epoch == (params["num_epochs"] - 1):
            print(
                "Epoch {:03d}: Loss: {:.3f}, Accuracy: {:.3%}".format(
                    epoch, epoch_loss_avg.result(), epoch_accuracy.result()
                )
            )
    
  10. To stop the connection to Neptune and sync all data, call the stop() method:

    run.stop()
    

Run the training#

Once you execute the code, you should see a Neptune link printed to the console output.

Sample output

https://app.neptune.ai/workspace-name/project-name/e/RUN-100/metadata

The general format is https://app.neptune.ai/<workspace>/<project> followed by the Neptune ID of the initialized object.

Follow the link to open the run in Neptune.

If Neptune can't find your project name or API token

As a best practice, you should save your Neptune API token and project name as environment variables:

export NEPTUNE_API_TOKEN="h0dHBzOi8aHR0cHM6Lkc78ghs74kl0jvYh3Kb8"
export NEPTUNE_PROJECT="ml-team/classification"

You can, however, also pass them as arguments when initializing Neptune:

run = neptune.init_run(
    api_token="h0dHBzOi8aHR0cHM6Lkc78ghs74kl0jvYh3Kb8",  # your token here
    project="ml-team/classification",  # your full project name here
)
  • API token: In the bottom-left corner, expand the user menu and select Get my API token.
  • Project name: in the top-right menu: Edit project details.

If you haven't registered, you can also log anonymously to a public project (make sure not to publish sensitive data through your code!):

run = neptune.init_run(
    api_token=neptune.ANONYMOUS_API_TOKEN,
    project="common/quickstarts",
)

Analyze the results in Neptune#

In the All metadata section, you can see our two custom namespaces that contain logged metadata:

  • datasets – our dataset metadata is tracked here.
  • training – contains other model and training metadata.

The other namespaces are generated by default. They contain automatically logged system and basic metadata.

See example in Neptune 

More options#

Saving to the model registry#

To organize your model training metadata separately from the runs, you can log the metadata to model objects. This will make the data appear in the Models section of the project.

Register a model#

You first need to register a unique model. You can then create as many versions of it as you like.

The below example would register a model with the key KERAS:

model_object = neptune.init_model(
    key="KERAS",
    name="Keras model",  # optional
    description="Model trained on MNIST with Keras",  # optional
)

model_object.stop()

Create a model version#

We can now initialize a version of the model we created above. If our project key is CLAS, we identify the model with that and the model key together:

model_version = neptune.init_model_version(
    model="CLAS-KERAS",
)

Now, we can log the metadata to the model version object. We can use the same metadata tracking methods as for runs.

model_version["run_id"] = run["sys/id"].fetch()
model_version["metrics/test_loss"] = test_loss
model_version["metrics/test_accuracy"] = acc
model_version["datasets/version"].track_files("mnist.npz")

# Save model artifacts to "weights" folder
model.save("weights")  # (1)!

# Upload model artifacts
model_version["model/weights"].upload_files("weights/*")
  1. This model object refers to the Keras Sequential model created earlier.

Finally, remember to stop Neptune objects once they're no longer needed.

model_version.stop()

See example model version in Neptune