Skip to content

TensorFlow integration guide#

Tip

See also: Keras integration guide

Open in Colab

Custom dashboard displaying metadata logged with TensorFlow

In this guide, we'll use Neptune to log metadata while training models with TensorFlow. We'll cover the following:

  • Tracking and versioning some data.
  • Logging losses and other metrics generated from training.
  • Logging predictions over multiple epochs.
  • Saving the generated model to Neptune.

See example in Neptune  Example scripts 

Before you start#

To follow this example, have the following installed:

pip install -U neptune tensorflow numpy requests

Logging example#

In this example, we'll work with the MNIST dataset. We'll prepare the data, set up a training loop, and log the metadata with Neptune.

Create a script#

  1. Import the needed libraries:

    import io
    
    import requests
    import tensorflow as tf
    import numpy as np
    
    import neptune
    
  2. Start a Neptune run:

    run = neptune.init_run()
    
    If Neptune can't find your project name or API token

    As a best practice, you should save your Neptune API token and project name as environment variables:

    export NEPTUNE_API_TOKEN="h0dHBzOi8aHR0cHM6Lkc78ghs74kl0jv...Yh3Kb8"
    
    export NEPTUNE_PROJECT="ml-team/classification"
    

    Alternatively, you can pass the information when using a function that takes api_token and project as arguments:

    run = neptune.init_run(
        api_token="h0dHBzOi8aHR0cHM6Lkc78ghs74kl0jv...Yh3Kb8", # (1)!
        project="ml-team/classification", # (2)!
    )
    
    1. In the bottom-left corner, expand the user menu and select Get my API token.
    2. You can copy the path from the project details ( Details & privacy).

    If you haven't registered, you can log anonymously to a public project:

    api_token=neptune.ANONYMOUS_API_TOKEN
    project="common/quickstarts"
    

    Make sure not to publish sensitive data through your code!

  3. Download the MNIST dataset and track its metadata:

    response = requests.get(
        "https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz"
    )
    with open("mnist.npz", "wb") as f:
        f.write(response.content)
    
    run["datasets/version"].track_files("mnist.npz")
    

    You can use the track_files() method for a file or folder when you want to track the metadata rather than upload the files in full.

    Learn more

    See how to work with files that are tracked as artifacts: Track artifacts

  4. Set up train and test sets:

    with np.load("mnist.npz") as data:
        train_examples = data["x_train"]
        train_labels = data["y_train"]
        test_examples = data["x_test"]
        test_labels = data["y_test"]
    
  5. Define and log model parameters:

    params = {
        "batch_size": 1024,
        "shuffle_buffer_size": 100,
        "lr": 0.001,
        "num_epochs": 10,
        "num_visualization_examples": 10,
    }
    
    run["training/model/params"] = params
    

    You can use simple assignment (=) to log single values or dictionaries to a field in the run. You can define the structure freely. In this case we're creating the nested namespaces "training/model" and, inside those, the "params" field where the dictionary is logged.

    Learn more

    Learn about the structure of Neptune objects: Namespaces and fields

  6. Normalize and prepare the data for training:

    def normalize_img(image):
        """Normalizes images: `uint8` -> `float32`."""
        return tf.cast(image, tf.float32)
    
    
    train_examples = normalize_img(train_examples)
    test_examples = normalize_img(test_examples)
    
    train_dataset = tf.data.Dataset.from_tensor_slices((train_examples, train_labels))
    test_dataset = tf.data.Dataset.from_tensor_slices((test_examples, test_labels))
    
    train_dataset = train_dataset.shuffle(params["shuffle_buffer_size"]).batch(
        params["batch_size"]
    )
    test_dataset = test_dataset.batch(params["batch_size"])
    
  7. Prepare the model:

    model = tf.keras.models.Sequential(
        [
            eras.layers.Input(shape=(28, 28)),
            tf.keras.layers.Flatten(),
            tf.keras.layers.Dense(128, activation="relu"),
            tf.keras.layers.Dense(10),
        ]
    )
    
    loss_object = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
    optimizer = tf.keras.optimizers.Adam(params["lr"])
    
    with io.StringIO() as s:
        model.summary(print_fn=lambda x, **kwargs: s.write(x + "\n"))
        model_summary = s.getvalue()
    
  8. Log the model summary:

    run["training/model/summary"] = model_summary
    
  9. Set up a training loop with Neptune logging (highlighted):

    def loss_and_preds(model, x, y, training):
        # training=training is needed only if there are layers with different
        # behavior during training versus inference (e.g. Dropout)
        y_ = model(x, training=training)
    
        return loss_object(y_true=y, y_pred=y_), y_
    
    
    def grad(model, inputs, targets):
        with tf.GradientTape() as tape:
            loss_value, _ = loss_and_preds(model, inputs, targets, training=True)
        return loss_value, tape.gradient(loss_value, model.trainable_variables)
    
    
    for epoch in range(params["num_epochs"]):
        epoch_loss_avg = tf.keras.metrics.Mean()
        epoch_accuracy = tf.keras.metrics.SparseCategoricalAccuracy()
    
        for x, y in train_dataset:
            loss_value, grads = grad(model, x, y)
            optimizer.apply_gradients(zip(grads, model.trainable_variables))
    
            epoch_loss_avg.update_state(loss_value)
            epoch_accuracy.update_state(y, model(x, training=True))
    
        # Log metrics for the epoch
        # Train metrics
        run["training/train/loss"].append(epoch_loss_avg.result())
        run["training/train/accuracy"].append(epoch_accuracy.result())
    
        # Log test metrics
        test_loss, test_preds = loss_and_preds(model, test_examples, test_labels, False)
        run["training/test/loss"].append(test_loss)
        acc = epoch_accuracy(test_labels, test_preds)
        run["training/test/accuracy"].append(acc)
    
        # Log test prediction
        for idx in range(params["num_visualization_examples"]):
            np_image = test_examples[idx].numpy().reshape(28, 28)
            image = neptune.types.File.as_image(np_image)
            pred_label = test_preds[idx].numpy().argmax()
            true_label = test_labels[idx]
            run[f"training/visualization/epoch_{epoch}"].append(
                image, description=f"pred={pred_label} | actual={true_label}"
            )
    
        if epoch % 5 == 0 or epoch == (params["num_epochs"] - 1):
            print(
                "Epoch {:03d}: Loss: {:.3f}, Accuracy: {:.3%}".format(
                    epoch, epoch_loss_avg.result(), epoch_accuracy.result()
                )
            )
    
  10. To stop the connection to Neptune and sync all data, call the stop() method:

    run.stop()
    

Run the training#

Once you execute the code, you should see a Neptune link printed to the console output.

Sample output

[neptune] [info ] Neptune initialized. Open in the app: https://app.neptune.ai/workspace/project/e/RUN-1

Follow the link to open the run in Neptune.

If Neptune can't find your project name or API token

As a best practice, you should save your Neptune API token and project name as environment variables:

export NEPTUNE_API_TOKEN="h0dHBzOi8aHR0cHM6Lkc78ghs74kl0jv...Yh3Kb8"
export NEPTUNE_PROJECT="ml-team/classification"

Alternatively, you can pass the information when using a function that takes api_token and project as arguments:

run = neptune.init_run(
    api_token="h0dHBzOi8aHR0cHM6Lkc78ghs74kl0jv...Yh3Kb8", # (1)!
    project="ml-team/classification", # (2)!
)
  1. In the bottom-left corner, expand the user menu and select Get my API token.
  2. You can copy the path from the project details ( Details & privacy).

If you haven't registered, you can log anonymously to a public project:

api_token=neptune.ANONYMOUS_API_TOKEN
project="common/quickstarts"

Make sure not to publish sensitive data through your code!

Analyze the results in Neptune#

In the All metadata section, you can see our two custom namespaces that contain logged metadata:

  • datasets – our dataset metadata is tracked here.
  • training – contains other model and training metadata.

The other namespaces are generated by default. They contain automatically logged system and basic metadata.

See example in Neptune