Skip to content

🤗 Transformers integration guide#

Open in Colab

Custom dashboard displaying metadata logged with Transformers

🤗 Transformers by Hugging Face is a popular framework for model training in the natural language processing domain. You can integrate metadata tracking with Neptune in two ways:

  • passing report_to="neptune" to the Trainer arguments.
  • setting up a Neptune callback and passing it to the Trainer callbacks.

See example in Neptune  Code examples 

Before you start#

  • Sign up at neptune.ai/register.
  • Create a project for storing your metadata.
  • Have 🤗 Transformers and Neptune installed.

    pip install -U neptune transformers
    
    conda install -c conda-forge neptune transformers
    
Passing your Neptune credentials

Once you've registered and created a project, set your Neptune API token and full project name to the NEPTUNE_API_TOKEN and NEPTUNE_PROJECT environment variables, respectively.

export NEPTUNE_API_TOKEN="h0dHBzOi8aHR0cHM.4kl0jvYh3Kb8...6Lc"

To find your API token: In the bottom-left corner of the Neptune app, expand the user menu and select Get my API token.

export NEPTUNE_PROJECT="ml-team/classification"

Your full project name has the form workspace-name/project-name. You can copy it from the project settings: Click the menu in the top-right → Edit project details.

On Windows, navigate to SettingsEdit the system environment variables, or enter the following in Command Prompt: setx SOME_NEPTUNE_VARIABLE 'some-value'


While it's not recommended especially for the API token, you can also pass your credentials in the code when initializing Neptune.

run = neptune.init_run(
    project="ml-team/classification",  # your full project name here
    api_token="h0dHBzOi8aHR0cHM6Lkc78ghs74kl0jvYh...3Kb8",  # your API token here
)

For more help, see Set Neptune credentials.

Using the integration#

In your model training script, import NeptuneCallback:

from transformers.integrations import NeptuneCallback

You have a couple of options for enabling Neptune logging in your script:

In your TrainingArguments, set the report_to argument to "neptune":

training_args = TrainingArguments(
    "quick-training-distilbert-mrpc",
    evaluation_strategy="steps",
    eval_steps=20,
    report_to="neptune",
)

trainer = Trainer(
    model,
    training_args,
    ...,
)

In this case, your Neptune credentials must be available through environment variables.

How do I save my credentials as environment variables?

Set your Neptune API token and full project name to the NEPTUNE_API_TOKEN and NEPTUNE_PROJECT environment variables, respectively.

export NEPTUNE_API_TOKEN="h0dHBzOi8aHR0cHM.4kl0jvYh3Kb8...ifQ=="
export NEPTUNE_PROJECT="ml-team/classification"
export NEPTUNE_API_TOKEN="h0dHBzOi8aHR0cHM.4kl0jvYh3Kb8...ifQ=="
export NEPTUNE_PROJECT="ml-team/classification"
setx NEPTUNE_API_TOKEN "h0dHBzOi8aHR0cHM.4kl0jvYh3Kb8...ifQ=="
setx NEPTUNE_PROJECT "ml-team/classification"

You can also navigate to SettingsEdit the system environment variables and add the variables there.

%env NEPTUNE_API_TOKEN="h0dHBzOi8aHR0cHM.4kl0jvYh3Kb8...ifQ=="
%env NEPTUNE_PROJECT="ml-team/classification"

To find your credentials:

  • API token: In the bottom-left corner of the Neptune app, expand your user menu and select Get your API token. If you need the token of a service account, go to the workspace or project settings and enter the Service accounts settings.
  • Project name: Your full project name has the form workspace-name/project-name. You can copy it from the project menu ( Edit project details).

If you're working in Google Colab, you can set your credentials with the os and getpass libraries:

import os
from getpass import getpass
os.environ["NEPTUNE_API_TOKEN"] = getpass("Enter your Neptune API token: ")
os.environ["NEPTUNE_PROJECT"] = "workspace-name/project-name"

Alternatively, for more logging options, create a Neptune callback:

neptune_callback = NeptuneCallback()

Tip

To add more detail to the tracked run, you can supply optional arguments to NeptuneCallback. See Changing what the callback logs.

Then pass the callback to the Trainer:

training_args = TrainingArguments(..., report_to="none") # (1)!
trainer = Trainer(
    model,
    training_args,
    ...,
    callbacks=[neptune_callback],
)
  1. To avoid creating several callbacks, set the report_to argument to "none".

Now, when you start the training with trainer.train(), your metadata will be logged in Neptune.

Note for Jupyter Notebook

Normally in notebooks and other interactive environments, you need to manually stop the Neptune run object with run.stop(). In this case, the Neptune run is stopped automatically when trainer.train() finishes, so manual stopping is not required.

More options#

Passing a Neptune run to the callback#

If you initialize a Neptune run first, you can pass that to the callback.

from transformers.integrations import NeptuneCallback
import neptune

training_args = TrainingArguments(..., report_to="none")

run = neptune.init_run(
    name="distilbert-mrpc", description="DistilBERT fine-tuned on GLUE/MRPC"
)

neptune_callback = NeptuneCallback(run=run)
trainer = Trainer(model, training_args, callbacks=[neptune_callback])
trainer.train()

Related

APIinit_run()

Logging additional metadata after training#

The Neptune run is stopped automatically when trainer.train() finishes. If you wish to continue logging to the run, you have a couple of options:

Option 1: Use the trainer method#

You can log additional metadata with trainer.log():

trainer.train()

# logging additional metadata after training
trainer.log(extra_metadata)

Option 2: Get the run object and use the Neptune API#

If you used a Neptune callback, you can also resume the run with the get_run() method:

trainer.train()

# logging additional metadata after training
run = NeptuneCallback.get_run(trainer)
run["predictions"] = ...

Alternatively, if you have a NeptuneCallback instance available, you can access its .run property:

trainer.train()

# logging additional metadata after training
neptune_callback.run["predictions"] = ...

Running multiple trainings for the same Trainer#

You can run multiple training sessions for the same Trainer object. However, each separate session will be logged to a new Neptune run.

For how to resume logging to an existing run, see Logging additional metadata after training.

training_args = TrainingArguments(
    "quick-training-distilbert-mrpc",
    evaluation_strategy="steps",
    num_train_epochs=2,
    report_to="neptune",
)

trainer = Trainer(model, training_args)

# First training session
trainer.train()

# Option: logging auxiliary metadata to the 1st training session
trainer.log(my_metadata)

# Second training session
trainer.train()

Changing what the callback logs#

You can control what metadata is logged through the Neptune callback by passing some extra arguments.

For the full list of options, see the API reference.

training_args = TrainingArguments(
    ...,
    save_strategy="epoch",  # Sets the checkpoint save strategy
    report_to="none",
)

neptune_callback = NeptuneCallback(
    ...,
    tags=["args-callback", "thin"],
    capture_hardware_metrics=False,
    log_parameters=False,
    log_checkpoints=None,
)

trainer = Trainer(model, training_args, callbacks=[neptune_callback])
trainer.train()