TensorFlow integration guide#
Tip
See also: Keras integration guide
In this guide, we'll use Neptune to log metadata while training models with TensorFlow. We'll cover the following:
- Tracking and versioning some data.
- Logging losses and other metrics generated from training.
- Logging predictions over multiple epochs.
- Saving the generated model to the model registry.
See example in Neptune  Example script 
Before you start#
- Sign up at neptune.ai/register.
- Create a project for storing your metadata.
To follow this example, have the following installed:
Upgrading with neptune-client
already installed
Important: To smoothly upgrade to the 1.0
version of the Neptune client library, first uninstall the neptune-client
library and then install neptune
.
Logging example#
In this example, we'll work with the MNIST dataset. We'll prepare the data, set up a training loop, and log the metadata with Neptune.
Create a script#
-
Import the needed libraries:
-
Start a Neptune run:
If Neptune can't find your project name or API token
As a best practice, you should save your Neptune API token and project name as environment variables:
export NEPTUNE_API_TOKEN="h0dHBzOi8aHR0cHM6Lkc78ghs74kl0jvYh3Kb8" export NEPTUNE_PROJECT="ml-team/classification"
You can, however, also pass them as arguments when initializing Neptune:
run = neptune.init_run( api_token="h0dHBzOi8aHR0cHM6Lkc78ghs74kl0jvYh3Kb8", # your token here project="ml-team/classification", # your full project name here )
- API token: In the bottom-left corner, expand the user menu and select Get my API token.
- Project name: in the top-right menu: → Edit project details.
If you haven't registered, you can also log anonymously to a public project (make sure not to publish sensitive data through your code!):
-
Download the MNIST dataset and track its metadata:
response = requests.get( "https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz" ) with open("mnist.npz", "wb") as f: f.write(response.content) run["datasets/version"].track_files("mnist.npz")
You can use the
track_files()
method for a file or folder when you want to track the metadata rather than upload the files in full.Learn more
See how to work with files that are tracked as artifacts: Track artifacts
-
Set up train and test sets:
-
Define and log model parameters:
params = { "batch_size": 1024, "shuffle_buffer_size": 100, "lr": 0.001, "num_epochs": 10, "num_visualization_examples": 10, } run["training/model/params"] = params
You can use simple assignment (
=
) to log single values or dictionaries to a field in the run. You can define the structure freely. In this case we're creating the nested namespaces"training/model"
and, inside those, the"params"
field where the dictionary is logged.Learn more
Learn about the structure of Neptune objects: Namespaces and fields
-
Normalize and prepare the data for training:
def normalize_img(image): """Normalizes images: `uint8` -> `float32`.""" return tf.cast(image, tf.float32) / 255.0 train_examples = normalize_img(train_examples) test_examples = normalize_img(test_examples) train_dataset = tf.data.Dataset.from_tensor_slices((train_examples, train_labels)) test_dataset = tf.data.Dataset.from_tensor_slices((test_examples, test_labels)) train_dataset = train_dataset.shuffle(params["shuffle_buffer_size"]).batch( params["batch_size"] ) test_dataset = test_dataset.batch(params["batch_size"])
-
Prepare the model:
model = tf.keras.models.Sequential( [ tf.keras.layers.Flatten(input_shape=(28, 28)), tf.keras.layers.Dense(128, activation="relu"), tf.keras.layers.Dense(10), ] ) loss_object = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True) optimizer = tf.keras.optimizers.Adam(params["lr"]) with io.StringIO() as s: model.summary(print_fn=lambda x: s.write(x + "\n")) model_summary = s.getvalue()
-
Log the model summary:
-
Set up a training loop with Neptune logging (highlighted):
def loss_and_preds(model, x, y, training): # training=training is needed only if there are layers with different # behavior during training versus inference (e.g. Dropout) y_ = model(x, training=training) return loss_object(y_true=y, y_pred=y_), y_ def grad(model, inputs, targets): with tf.GradientTape() as tape: loss_value, _ = loss_and_preds(model, inputs, targets, training=True) return loss_value, tape.gradient(loss_value, model.trainable_variables) for epoch in range(params["num_epochs"]): epoch_loss_avg = tf.keras.metrics.Mean() epoch_accuracy = tf.keras.metrics.SparseCategoricalAccuracy() for x, y in train_dataset: loss_value, grads = grad(model, x, y) optimizer.apply_gradients(zip(grads, model.trainable_variables)) epoch_loss_avg.update_state(loss_value) epoch_accuracy.update_state(y, model(x, training=True)) # Log metrics for the epoch # Train metrics run["training/train/loss"].append(epoch_loss_avg.result()) run["training/train/accuracy"].append(epoch_accuracy.result()) # Log test metrics test_loss, test_preds = loss_and_preds(model, test_examples, test_labels, False) run["training/test/loss"].append(test_loss) acc = epoch_accuracy(test_labels, test_preds) run["training/test/accuracy"].append(acc) # Log test prediction for idx in range(params["num_visualization_examples"]): np_image = test_examples[idx].numpy().reshape(28, 28) image = neptune.types.File.as_image(np_image) pred_label = test_preds[idx].numpy().argmax() true_label = test_labels[idx] run[f"training/visualization/epoch_{epoch}"].append( image, description=f"pred={pred_label} | actual={true_label}" ) if epoch % 5 == 0 or epoch == (params["num_epochs"] - 1): print( "Epoch {:03d}: Loss: {:.3f}, Accuracy: {:.3%}".format( epoch, epoch_loss_avg.result(), epoch_accuracy.result() ) )
-
To stop the connection to Neptune and sync all data, call the
stop()
method:
Run the training#
Once you execute the code, you should see a Neptune link printed to the console output.
Sample output
https://app.neptune.ai/workspace-name/project-name/e/RUN-100/metadata
The general format is https://app.neptune.ai/<workspace>/<project>
followed by the Neptune ID of the initialized object.
Follow the link to open the run in Neptune.
If Neptune can't find your project name or API token
As a best practice, you should save your Neptune API token and project name as environment variables:
export NEPTUNE_API_TOKEN="h0dHBzOi8aHR0cHM6Lkc78ghs74kl0jvYh3Kb8"
export NEPTUNE_PROJECT="ml-team/classification"
You can, however, also pass them as arguments when initializing Neptune:
run = neptune.init_run(
api_token="h0dHBzOi8aHR0cHM6Lkc78ghs74kl0jvYh3Kb8", # your token here
project="ml-team/classification", # your full project name here
)
- API token: In the bottom-left corner, expand the user menu and select Get my API token.
- Project name: in the top-right menu: → Edit project details.
If you haven't registered, you can also log anonymously to a public project (make sure not to publish sensitive data through your code!):
Analyze the results in Neptune#
In the All metadata section, you can see our two custom namespaces that contain logged metadata:
- datasets – our dataset metadata is tracked here.
- training – contains other model and training metadata.
The other namespaces are generated by default. They contain automatically logged system and basic metadata.
More options#
Saving to the model registry#
To organize your model training metadata separately from the runs, you can log the metadata to model objects. This will make the data appear in the Models section of the project.
Register a model#
You first need to register a unique model. You can then create as many versions of it as you like.
The below example would register a model with the key KERAS
:
model_object = neptune.init_model(
key="KERAS",
name="Keras model", # optional
description="Model trained on MNIST with Keras", # optional
)
model_object.stop()
Create a model version#
We can now initialize a version of the model we created above. If our project key is CLAS
, we identify the model with that and the model key together:
Now, we can log the metadata to the model version object. We can use the same metadata tracking methods as for runs.
model_version["run_id"] = run["sys/id"].fetch()
model_version["metrics/test_loss"] = test_loss
model_version["metrics/test_accuracy"] = acc
model_version["datasets/version"].track_files("mnist.npz")
# Save model artifacts to "weights" folder
model.save("weights") # (1)!
# Upload model artifacts
model_version["model/weights"].upload_files("weights/*")
- This
model
object refers to the Keras Sequential model created earlier.
Finally, remember to stop Neptune objects once they're no longer needed.
See example model version in Neptune