TensorFlow integration guide#
Tip
See also: Keras integration guide
In this guide, we'll use Neptune to log metadata while training models with TensorFlow. We'll cover the following:
- Tracking and versioning some data.
- Logging losses and other metrics generated from training.
- Logging predictions over multiple epochs.
- Saving the generated model to Neptune.
See example in Neptune  Example scripts 
Before you start#
- Sign up at neptune.ai/register.
- Create a project for storing your metadata.
To follow this example, have the following installed:
Logging example#
In this example, we'll work with the MNIST dataset. We'll prepare the data, set up a training loop, and log the metadata with Neptune.
Create a script#
-
Import the needed libraries:
-
Start a Neptune run:
If Neptune can't find your project name or API token
As a best practice, you should save your Neptune API token and project name as environment variables:
Alternatively, you can pass the information when using a function that takes
api_token
andproject
as arguments:run = neptune.init_run( api_token="h0dHBzOi8aHR0cHM6Lkc78ghs74kl0jv...Yh3Kb8", # (1)! project="ml-team/classification", # (2)! )
- In the bottom-left corner, expand the user menu and select Get my API token.
- You can copy the path from the project details ( → Details & privacy).
If you haven't registered, you can log anonymously to a public project:
Make sure not to publish sensitive data through your code!
-
Download the MNIST dataset and track its metadata:
response = requests.get( "https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz" ) with open("mnist.npz", "wb") as f: f.write(response.content) run["datasets/version"].track_files("mnist.npz")
You can use the
track_files()
method for a file or folder when you want to track the metadata rather than upload the files in full.Learn more
See how to work with files that are tracked as artifacts: Track artifacts
-
Set up train and test sets:
-
Define and log model parameters:
params = { "batch_size": 1024, "shuffle_buffer_size": 100, "lr": 0.001, "num_epochs": 10, "num_visualization_examples": 10, } run["training/model/params"] = params
You can use simple assignment (
=
) to log single values or dictionaries to a field in the run. You can define the structure freely. In this case we're creating the nested namespaces"training/model"
and, inside those, the"params"
field where the dictionary is logged.Learn more
Learn about the structure of Neptune objects: Namespaces and fields
-
Normalize and prepare the data for training:
def normalize_img(image): """Normalizes images: `uint8` -> `float32`.""" return tf.cast(image, tf.float32) train_examples = normalize_img(train_examples) test_examples = normalize_img(test_examples) train_dataset = tf.data.Dataset.from_tensor_slices((train_examples, train_labels)) test_dataset = tf.data.Dataset.from_tensor_slices((test_examples, test_labels)) train_dataset = train_dataset.shuffle(params["shuffle_buffer_size"]).batch( params["batch_size"] ) test_dataset = test_dataset.batch(params["batch_size"])
-
Prepare the model:
model = tf.keras.models.Sequential( [ eras.layers.Input(shape=(28, 28)), tf.keras.layers.Flatten(), tf.keras.layers.Dense(128, activation="relu"), tf.keras.layers.Dense(10), ] ) loss_object = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True) optimizer = tf.keras.optimizers.Adam(params["lr"]) with io.StringIO() as s: model.summary(print_fn=lambda x, **kwargs: s.write(x + "\n")) model_summary = s.getvalue()
-
Log the model summary:
-
Set up a training loop with Neptune logging (highlighted):
def loss_and_preds(model, x, y, training): # training=training is needed only if there are layers with different # behavior during training versus inference (e.g. Dropout) y_ = model(x, training=training) return loss_object(y_true=y, y_pred=y_), y_ def grad(model, inputs, targets): with tf.GradientTape() as tape: loss_value, _ = loss_and_preds(model, inputs, targets, training=True) return loss_value, tape.gradient(loss_value, model.trainable_variables) for epoch in range(params["num_epochs"]): epoch_loss_avg = tf.keras.metrics.Mean() epoch_accuracy = tf.keras.metrics.SparseCategoricalAccuracy() for x, y in train_dataset: loss_value, grads = grad(model, x, y) optimizer.apply_gradients(zip(grads, model.trainable_variables)) epoch_loss_avg.update_state(loss_value) epoch_accuracy.update_state(y, model(x, training=True)) # Log metrics for the epoch # Train metrics run["training/train/loss"].append(epoch_loss_avg.result()) run["training/train/accuracy"].append(epoch_accuracy.result()) # Log test metrics test_loss, test_preds = loss_and_preds(model, test_examples, test_labels, False) run["training/test/loss"].append(test_loss) acc = epoch_accuracy(test_labels, test_preds) run["training/test/accuracy"].append(acc) # Log test prediction for idx in range(params["num_visualization_examples"]): np_image = test_examples[idx].numpy().reshape(28, 28) image = neptune.types.File.as_image(np_image) pred_label = test_preds[idx].numpy().argmax() true_label = test_labels[idx] run[f"training/visualization/epoch_{epoch}"].append( image, description=f"pred={pred_label} | actual={true_label}" ) if epoch % 5 == 0 or epoch == (params["num_epochs"] - 1): print( "Epoch {:03d}: Loss: {:.3f}, Accuracy: {:.3%}".format( epoch, epoch_loss_avg.result(), epoch_accuracy.result() ) )
-
To stop the connection to Neptune and sync all data, call the
stop()
method:
Run the training#
Once you execute the code, you should see a Neptune link printed to the console output.
Sample output
[neptune] [info ] Neptune initialized. Open in the app:
https://app.neptune.ai/workspace/project/e/RUN-1
Follow the link to open the run in Neptune.
If Neptune can't find your project name or API token
As a best practice, you should save your Neptune API token and project name as environment variables:
Alternatively, you can pass the information when using a function that takes api_token
and project
as arguments:
run = neptune.init_run(
api_token="h0dHBzOi8aHR0cHM6Lkc78ghs74kl0jv...Yh3Kb8", # (1)!
project="ml-team/classification", # (2)!
)
- In the bottom-left corner, expand the user menu and select Get my API token.
- You can copy the path from the project details ( → Details & privacy).
If you haven't registered, you can log anonymously to a public project:
Make sure not to publish sensitive data through your code!
Analyze the results in Neptune#
In the All metadata section, you can see our two custom namespaces that contain logged metadata:
- datasets – our dataset metadata is tracked here.
- training – contains other model and training metadata.
The other namespaces are generated by default. They contain automatically logged system and basic metadata.