Working with 🤗 Transformers#
Neptune compatibility note
This integration has not yet been updated for neptune
1.x and requires using neptune-client
🤗 Transformers by Hugging Face is a popular framework for model training in the natural language processing domain.
You can integrate metadata tracking with Neptune either by:
report_to="neptune"to the Trainer arguments.
- setting up a Neptune callback and passing it to the Trainer callbacks.
See example in Neptune  Code examples
- API reference ≫ 🤗 Transformers
NeptuneCallbackin the 🤗 Transformers API reference
- 🤗 Transformers on GitHub
Before you start#
- Set up Neptune. Instructions:
Using the integration#
In your model-training script, import
You have a couple of options for enabling Neptune logging in your script:
TrainingArguments, set the
report_to argument to
training_args = TrainingArguments( "quick-training-distilbert-mrpc", evaluation_strategy="steps", eval_steps=20, report_to="neptune", ) trainer = Trainer( model, training_args, ... )
In this case, your Neptune credentials must be available through environment variables.
How do I save my credentials as environment variables?
Set your Neptune API token and full project name to the
NEPTUNE_PROJECT environment variables, respectively.
- On Windows, the command is
- On Windows, the command is
Finding your credentials:
- API token: In the top-right corner of the Neptune app, click your avatar and select Get your API token.
- Project: Your full project name has the form
workspace-name/project-name. To copy the name, navigate to your project → Settings → Properties.
If you're working in Colab, you can set your credentials with the os and getpass libraries:
Alternatively, for more logging options, create a Neptune callback:
To add more detail to the tracked run, you can supply optional arguments to
NeptuneCallback. See Changing what the callback logs.
Then pass the callback to the Trainer:
training_args = TrainingArguments(..., report_to="none") # (1)! trainer = Trainer( model, training_args, ..., callbacks=[neptune_callback], )
- To avoid creating several callbacks, set the
"none". This will be the default behavior in version 5 of 🤗 Transformers.
Now, when you start the training with
trainer.train(), your metadata will be logged in Neptune.
Note for Jupyter Notebook
Normally in notebooks and other interactive environments, you need to manually stop the Neptune
run object with
run.stop(). In this case, the Neptune run is stopped automatically when
trainer.train() finishes, so manual stopping is not required.
Passing a Neptune run to the callback#
If you initialize a Neptune run first, you can pass that to the callback.
from transformers.integrations import NeptuneCallback training_args = TrainingArguments(..., report_to="none") run = neptune.init_run( name="distilbert-mrpc", description="DistilBERT fine-tuned on GLUE/MRPC" ) neptune_callback = NeptuneCallback(run=run) trainer = Trainer(model, training_args, callbacks=[neptune_callback]) trainer.train()
Logging additional metadata after training#
The Neptune run is stopped automatically when
trainer.train() finishes. If you wish to continue logging to the run, you have a couple of options:
Option 1: Use the trainer method#
You can log additional metadata with
Option 2: Get the run object and use the Neptune API#
If you used a Neptune callback, you can also resume the run with the
trainer.train() # logging additional metadata after training run = NeptuneCallback.get_run(trainer) run["predictions"] = ...
Alternatively, if you have a
NeptuneCallback instance available, you can access its
trainer.train() # logging additional metadata after training neptune_callback.run["predictions"] = ...
Running multiple trainings for the same Trainer#
You can run multiple training sessions for the same Trainer object. However, each separate session will be logged to a new Neptune run.
For how to resume logging to an existing run, see Logging additional metadata after training.
training_args = TrainingArguments( "quick-training-distilbert-mrpc", evaluation_strategy="steps", num_train_epochs=2, report_to="neptune", ) trainer = Trainer(model, training_args) # First training session trainer.train() # Option: logging auxiliary metadata to the 1st training session trainer.log(my_metadata) # Second training session trainer.train()
Changing what the callback logs#
You can control what metadata is logged through the Neptune callback by passing some extra arguments.
For the full list of options, see API reference ≫ 🤗 Transformers.
training_args = TrainingArguments( ..., save_strategy="epoch", # Sets the checkpoint save strategy report_to="none", ) neptune_callback = NeptuneCallback( ..., tags=["args-callback", "thin"], capture_hardware_metrics=False, log_parameters=False, log_checkpoints=None, ) trainer = Trainer(model, training_args, callbacks=[neptune_callback]) trainer.train()