Working with 🤗 Transformers#
Neptune compatibility note
This integration has not yet been updated for neptune 1.x
and requires using neptune-client <1.0.0
.
🤗 Transformers by Hugging Face is a popular framework for model training in the natural language processing domain.
You can integrate metadata tracking with Neptune either by:
- passing
report_to="neptune"
to the Trainer arguments. - setting up a Neptune callback and passing it to the Trainer callbacks.
See example in Neptune  Code examples 
Related
- API reference ≫ 🤗 Transformers
NeptuneCallback
in the 🤗 Transformers API reference- 🤗 Transformers on GitHub
Before you start#
- Set up Neptune. Instructions:
Using the integration#
In your model-training script, import NeptuneCallback
:
You have a couple of options for enabling Neptune logging in your script:
In your TrainingArguments
, set the report_to
argument to "neptune"
:
training_args = TrainingArguments(
"quick-training-distilbert-mrpc",
evaluation_strategy="steps",
eval_steps=20,
report_to="neptune",
)
trainer = Trainer(
model,
training_args,
...
)
In this case, your Neptune credentials must be available through environment variables.
How do I save my credentials as environment variables?
Set your Neptune API token and full project name to the NEPTUNE_API_TOKEN
and NEPTUNE_PROJECT
environment variables, respectively.
For example:
- On Windows, the command is
set
instead ofexport
.
- On Windows, the command is
set
instead ofexport
.
Finding your credentials:
- API token: In the top-right corner of the Neptune app, click your avatar and select Get your API token.
- Project: Your full project name has the form
workspace-name/project-name
. To copy the name, navigate to your project → Settings → Properties.
If you're working in Colab, you can set your credentials with the os and getpass libraries:
Alternatively, for more logging options, create a Neptune callback:
Tip
To add more detail to the tracked run, you can supply optional arguments to NeptuneCallback
. See Changing what the callback logs.
Then pass the callback to the Trainer:
training_args = TrainingArguments(..., report_to="none") # (1)!
trainer = Trainer(
model,
training_args,
...,
callbacks=[neptune_callback],
)
- To avoid creating several callbacks, set the
report_to
argument to"none"
. This will be the default behavior in version 5 of 🤗 Transformers.
Now, when you start the training with trainer.train()
, your metadata will be logged in Neptune.
Note for Jupyter Notebook
Normally in notebooks and other interactive environments, you need to manually stop the Neptune run
object with run.stop()
. In this case, the Neptune run is stopped automatically when trainer.train()
finishes, so manual stopping is not required.
More options#
Passing a Neptune run to the callback#
If you initialize a Neptune run first, you can pass that to the callback.
from transformers.integrations import NeptuneCallback
training_args = TrainingArguments(..., report_to="none")
run = neptune.init_run(
name="distilbert-mrpc", description="DistilBERT fine-tuned on GLUE/MRPC"
)
neptune_callback = NeptuneCallback(run=run)
trainer = Trainer(model, training_args, callbacks=[neptune_callback])
trainer.train()
Related
API ≫ init_run()
Logging additional metadata after training#
The Neptune run is stopped automatically when trainer.train()
finishes. If you wish to continue logging to the run, you have a couple of options:
Option 1: Use the trainer method#
You can log additional metadata with trainer.log()
:
Option 2: Get the run object and use the Neptune API#
If you used a Neptune callback, you can also resume the run with the get_run()
method:
trainer.train()
# logging additional metadata after training
run = NeptuneCallback.get_run(trainer)
run["predictions"] = ...
Alternatively, if you have a NeptuneCallback
instance available, you can access its .run
property:
trainer.train()
# logging additional metadata after training
neptune_callback.run["predictions"] = ...
Running multiple trainings for the same Trainer#
You can run multiple training sessions for the same Trainer object. However, each separate session will be logged to a new Neptune run.
For how to resume logging to an existing run, see Logging additional metadata after training.
training_args = TrainingArguments(
"quick-training-distilbert-mrpc",
evaluation_strategy="steps",
num_train_epochs=2,
report_to="neptune",
)
trainer = Trainer(model, training_args)
# First training session
trainer.train()
# Option: logging auxiliary metadata to the 1st training session
trainer.log(my_metadata)
# Second training session
trainer.train()
Changing what the callback logs#
You can control what metadata is logged through the Neptune callback by passing some extra arguments.
For the full list of options, see API reference ≫ 🤗 Transformers.
training_args = TrainingArguments(
...,
save_strategy="epoch", # Sets the checkpoint save strategy
report_to="none",
)
neptune_callback = NeptuneCallback(
...,
tags=["args-callback", "thin"],
capture_hardware_metrics=False,
log_parameters=False,
log_checkpoints=None,
)
trainer = Trainer(model, training_args, callbacks=[neptune_callback])
trainer.train()