🤗 Transformers integration guide#
🤗 Transformers by Hugging Face is a popular framework for model training in the natural language processing domain. You can integrate metadata tracking with Neptune in two ways:
- passing
report_to="neptune"
to the Trainer arguments. - setting up a Neptune callback and passing it to the Trainer callbacks.
See example in Neptune  Code examples 
Before you start#
- Sign up at neptune.ai/register.
- Create a project for storing your metadata.
-
Have 🤗 Transformers and Neptune installed.
Upgrading with
neptune-client
already installedImportant: To smoothly upgrade to the
1.0
version of the Neptune client library, first uninstall theneptune-client
library and then installneptune
.
Passing your Neptune credentials
Once you've registered and created a project, set your Neptune API token and full project name to the NEPTUNE_API_TOKEN
and NEPTUNE_PROJECT
environment variables, respectively.
To find your API token: In the bottom-left corner of the Neptune app, expand the user menu and select Get my API token.
To find your project: Your full project name has the form workspace-name/project-name
. To copy the name, click the menu in the top-right corner and select Edit project details.
While it's not recommended especially for the API token, you can also pass your credentials in the code when initializing Neptune.
run = neptune.init_run(
project="ml-team/classification", # your full project name here
api_token="h0dHBzOi8aHR0cHM6Lkc78ghs74kl0jvYh...3Kb8", # your API token here
)
For more help, see Set Neptune credentials.
Using the integration#
In your model-training script, import NeptuneCallback
:
You have a couple of options for enabling Neptune logging in your script:
In your TrainingArguments
, set the report_to
argument to "neptune"
:
training_args = TrainingArguments(
"quick-training-distilbert-mrpc",
evaluation_strategy="steps",
eval_steps=20,
report_to="neptune",
)
trainer = Trainer(
model,
training_args,
...,
)
In this case, your Neptune credentials must be available through environment variables.
How do I save my credentials as environment variables?
Set your Neptune API token and full project name to the NEPTUNE_API_TOKEN
and NEPTUNE_PROJECT
environment variables, respectively.
For example:
- On Windows, the command is
set
instead ofexport
.
- On Windows, the command is
set
instead ofexport
.
Finding your credentials:
- API token: In the bottom-left corner of the Neptune app, expand your user menu and select Get your API token.
- Project: Your full project name has the form
workspace-name/project-name
. To copy the name, click the menu in the top-right corner and select Edit project details.
If you're working in Colab, you can set your credentials with the os and getpass libraries:
Alternatively, for more logging options, create a Neptune callback:
Tip
To add more detail to the tracked run, you can supply optional arguments to NeptuneCallback
. See Changing what the callback logs.
Then pass the callback to the Trainer:
training_args = TrainingArguments(..., report_to="none") # (1)!
trainer = Trainer(
model,
training_args,
...,
callbacks=[neptune_callback],
)
- To avoid creating several callbacks, set the
report_to
argument to"none"
. This will be the default behavior in version 5 of 🤗 Transformers.
Now, when you start the training with trainer.train()
, your metadata will be logged in Neptune.
Note for Jupyter Notebook
Normally in notebooks and other interactive environments, you need to manually stop the Neptune run
object with run.stop()
. In this case, the Neptune run is stopped automatically when trainer.train()
finishes, so manual stopping is not required.
More options#
Passing a Neptune run to the callback#
If you initialize a Neptune run first, you can pass that to the callback.
from transformers.integrations import NeptuneCallback
import neptune
training_args = TrainingArguments(..., report_to="none")
run = neptune.init_run(
name="distilbert-mrpc", description="DistilBERT fine-tuned on GLUE/MRPC"
)
neptune_callback = NeptuneCallback(run=run)
trainer = Trainer(model, training_args, callbacks=[neptune_callback])
trainer.train()
Related
API ≫ init_run()
Logging additional metadata after training#
The Neptune run is stopped automatically when trainer.train()
finishes. If you wish to continue logging to the run, you have a couple of options:
Option 1: Use the trainer method#
You can log additional metadata with trainer.log()
:
Option 2: Get the run object and use the Neptune API#
If you used a Neptune callback, you can also resume the run with the get_run()
method:
trainer.train()
# logging additional metadata after training
run = NeptuneCallback.get_run(trainer)
run["predictions"] = ...
Alternatively, if you have a NeptuneCallback
instance available, you can access its .run
property:
trainer.train()
# logging additional metadata after training
neptune_callback.run["predictions"] = ...
Running multiple trainings for the same Trainer#
You can run multiple training sessions for the same Trainer object. However, each separate session will be logged to a new Neptune run.
For how to resume logging to an existing run, see Logging additional metadata after training.
training_args = TrainingArguments(
"quick-training-distilbert-mrpc",
evaluation_strategy="steps",
num_train_epochs=2,
report_to="neptune",
)
trainer = Trainer(model, training_args)
# First training session
trainer.train()
# Option: logging auxiliary metadata to the 1st training session
trainer.log(my_metadata)
# Second training session
trainer.train()
Changing what the callback logs#
You can control what metadata is logged through the Neptune callback by passing some extra arguments.
For the full list of options, see API reference ≫ 🤗 Transformers.
training_args = TrainingArguments(
...,
save_strategy="epoch", # Sets the checkpoint save strategy
report_to="none",
)
neptune_callback = NeptuneCallback(
...,
tags=["args-callback", "thin"],
capture_hardware_metrics=False,
log_parameters=False,
log_checkpoints=None,
)
trainer = Trainer(model, training_args, callbacks=[neptune_callback])
trainer.train()
Related
- What you can log and display
- What Neptune logs automatically
- API reference ≫ 🤗 Transformers integration
NeptuneCallback
in the 🤗 Transformers API reference- 🤗 Transformers on GitHub