Continuous Integration and Delivery(CI/CD)

You can use Neptune as a model registry that lets you log, query and download all model-building metadata.

This guide shows how to query and retrieve a Neptune Run and all the model metadata to execute Continuous Integration/Continuous Delivery jobs.

These jobs can be executed:

  • On scheduled time

  • Every time you have new data

  • Each time there is a code commit

By the end of this guide, you will get a GitHub Action set up so that you can evaluate if the staging model (challenger) is better than the production model (current best) and promote the new best model to production automatically

Keywords: CI/CD, Continuous Integration, Continuous Delivery, CI/CD machine learning

Before you start

Make sure you meet the following prerequisites before starting:

Step 1: Get the Challenger (new) and Champion (best) model metadata

In this step, you will:

  1. Get the Neptune Project where our Runs are.

  2. Get the Run ID’s of the champion and the challenger models

  3. Take the Run ID’s we got and use them to resume the Run in read-only mode so the metadata is not accidentally changed.

  4. Fetching and downloading data from Neptune

  5. Loading model checkpoints and data

First, you have to get the Neptune project that contains the Runs that you want to evaluate. You do that by using the .get_project() method passing the project name.

project = neptune.get_project(api_token=os.getenv("NEPTUNE_API_TOKEN"),
project='<YOUR_PROJECT_NAME>')

Note You can useproject='common/pytorch-integration'and the api_token = 'ANONYMOUS'to explore without having to create a Neptune account.

The api_token is passed as Github Action environment variable created from a Github secret. How to set it up is described in a later section.

Second, we want to filter Runs and get the champion and challenger's Run ID.

To do that:

  • Use the .fetch_runs_table() method and tag argument.

  • Converted our query results into a pandas dataframe using the .to_pandas() method.

  • Extract the Run ID just like you would in a pandas dataframe.

# Champion
champion_runs_table_df = project.fetch_runs_table(tag='champion').to_pandas()
champion_run_id = champion_runs_table_df['sys/id'].values[0]
# Challenger
challenger_runs_table_df = project.fetch_runs_table(tag='challenger').to_pandas()
challenger_run_id = challenger_runs_table_df['sys/id'].values[0]
  • Take the Run ID and use it to resume the Run in read-only mode in order to avoid logging or changing existing metadata.

champion_run = neptune.init(
api_token=os.getenv("NEPTUNE_API_TOKEN"),
project='<YOUR_PROJECT_NAME>',
run = champion_run_id,
mode = 'read-only'
)
challenger_run = neptune.init(
api_token=os.getenv("NEPTUNE_API_TOKEN"),
project='<YOUR_PROJECT_NAME>',
run = challenger_run_id,
mode = 'read-only'
)

Executing this snippet will give you a link like this one: https://app.neptune.ai/common/pytorch-integration/e/PYTOR1-72 with common/pytorch-integration replaced by your_workspace/your_project_name, and PYTOR1-72replaced by your Run ID.

Query and download model metadata from Neptune

To retrieve metadata from Neptune you should:

# fetching non-file values
parameters = run['config/hyperparameters'].fetch()
parameters['device'] = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
data_dir = run['config/dataset/path'].fetch()
  • Use .download() method to retrieve files such as the model weights for both the champion and challenger.

# Download model weights
model_fname = 'model.pth'
model_weights = run['io_files/artifacts/basemodel'].download(f'./{model_fname}')

Load model object and data loaders from metadata

  • Load the data for the evaluation

# loading dataset
data_tfms = {
"val": transforms.Compose(
[
transforms.ToTensor(),
transforms.Normalize(
[0.485, 0.456, 0.406],
[0.229, 0.224, 0.225]
),
]
)
}
validset = datasets.CIFAR10(
data_dir, train=False, transform=data_tfms["val"], download=True
)
validloader = torch.utils.data.DataLoader(
validset, batch_size=bs, num_workers=0
)
  • Load the model checkpoints for both the champion and challenger.

# loading model weights
model = BaseModel(
parameters["input_sz"],
parameters["input_sz"],
parameters["n_classes"]).to(parameters["device"])
checkpoint = torch.load(model_fname, map_location=parameters['device'])
model.load_state_dict(checkpoint)
model.eval()

Step 2: Run inference on new data and evaluate the accuracy

In this step, you need to run inference on the evaluation set for both the champion and the challenger. If the challenger performs better than the champion you will:

  • add the tag 'best' to the challenger Run

  • remove the tag 'best' from champion Run.

# Run inference on validation set and get score
champion_score = get_model_score(champion_model, images,labels)
challenger_score = get_model_score(challenger_model, images, labels)
# Test challenger model score against champion model score
assert challenger_score >= champion_score, \
f'''The challenger model accuracy {round(challenger_score*100,2)}
lower than threshold {round(champion_score*100,2)}%'''
print(f'''The challenger model with Run ID {challenger_run_id}
has accuracy of {challenger_score*100}% that is greater than
the current champion is promoted to production''')
print("------------Evaluation test passed!!!------------")

Step 3: Push the Challenger(new-best) to production

Now you need to push the challenger to production using tags.

  • Stop the challenger run and resume the challenger and former champion Run in the default mode (Asynchronous) so we can change the metadata in the run, in this case, we want to change the tags.

new_champion_run = neptune.init(
api_token=os.getenv("NEPTUNE_API_TOKEN"),
project='<YOUR_PROJECT_NAME>',
run = challenger_run_id
)
former_champion_run = neptune.init(
api_token=os.getenv("NEPTUNE_API_TOKEN"),
project='<YOUR_PROJECT_NAME>',
run = champion_run_id
)
new_champion_run['sys/tags'].add('best')
former_champion_run['sys/tags'].remove('best')

Note With this, you can create another pipeline take the Neptune Run with the tag best and deploys it to production.

Step 4: Create and run a Github action

In this step, you will create the Github Action YAML file that executes the script you created in the previous steps, based on each new code version that's pushed to your repo.

After you commit this project to your Github repository you should see the following Github Action activity:

First, create a Github secret that will help you pass your api_token securely to your python script as an environment variable.

The secret name will be set to NEPTUNE_API_TOKEN and value to ANONYMOUS.

Second, create a YAML file under the folder .github/workflows then copy and paste the code below

name: CI/CD
on: push
jobs:
model-eval-promotion:
runs-on: ubuntu-latest
env:
NEPTUNE_API_TOKEN: ${{ secrets.NEPTUNE_API_TOKEN }}
steps:
- uses: actions/[email protected]
- uses: actions/setup-[email protected]
with:
python-version: 3.8
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install -r CI_CD/requirements.txt
- name: Model Promotion(CI/CD)
run: python CI_CD/scripts/model_promotion.py

Finally, commit your code to your Github repo!

Summary

In this guide, you learned:

  • How to fetch and download metadata from the Neptune Runs into your CI/CD jobs

  • How to resume Runs in read-only mode to avoid changing the metadata.

  • How to create a GitHub Action that leverages the metadata about your models that you retrieved from Neptune to run a CI/CD Job that can evaluate if the challenger is better than the champion.

See also