Skip to content

🤗 Transformers#

Open in Colab

Custom dashboard displaying metadata logged with Transformers

🤗 Transformers by Hugging Face is a popular framework for model training in the natural language processing domain.

You can integrate metadata tracking with Neptune either by:

  • passing report_to="neptune" to the Trainer arguments.
  • setting up a Neptune callback and passing it to the Trainer callbacks.

See example in Neptune  Code examples 

Before you start#

Using the integration#

In your model-training script, import NeptuneCallback:

from transformers.integrations import NeptuneCallback

You have a couple of options for enabling Neptune logging in your script:

  • In your TrainingArguments, set the report_to argument to "neptune":

    training_args = TrainingArguments(
        "quick-training-distilbert-mrpc", 
        evaluation_strategy="steps",
        eval_steps=20,
        report_to="neptune",
    )
    trainer = Trainer(
        model,
        training_args,
        ...
    )
    
  • Alternatively, for more logging options, create a Neptune callback:

    neptune_callback = NeptuneCallback()
    

    Tip

    To add more detail to the tracked run, you can supply optional arguments to NeptuneCallback. See the example below.

    Then pass the callback to the Trainer:

    training_args = TrainingArguments(..., report_to=None)  # (1)
    trainer = Trainer(
        model,
        training_args,
        ...,
        callbacks=[neptune_callback],
    )
    
    1. To avoid creating several callbacks, set the report_to argument to None. This will be the default in upcoming releases of 🤗 Transformers.

Now, when you start the training with trainer.train(), your metadata will be logged in Neptune.

Note for Jupyter Notebook

Normally in notebooks and other interactive environments, you need to manually stop the Neptune run object with run.stop(). In this case, the Neptune run is stopped automatically when trainer.train() finishes, so manual stopping is not required.

More options#

Passing a Neptune run to the callback#

If you initialize a Neptune run first, you can pass that to the callback.

from transformers.integrations import NeptuneCallback

training_args = TrainingArguments(..., report_to=None)

run = neptune.init_run(
    name="distilbert-mrpc", description="DistilBERT fine-tuned on GLUE/MRPC"
)

neptune_callback = NeptuneCallback(run=run)
trainer = Trainer(model, training_args, callbacks=[neptune_callback])
trainer.train()

Related

APIinit_run()

Logging additional metadata after training#

The Neptune run is stopped automatically when trainer.train() finishes. If you wish to continue logging to the run, you have a couple of options:

Option 1: Use the trainer method#

You can log additional metadata with trainer.log():

trainer.train()

# logging additional metadata after training
trainer.log(extra_metadata)

Option 2: Get the run object and use the Neptune API#

If you used a Neptune callback, you can also resume the run with the get_run() method:

trainer.train()

# logging additional metadata after training
run = NeptuneCallback.get_run(trainer)
run["predictions"] = ...

Alternatively, if you have a NeptuneCallback instance available, you can access its .run property:

trainer.train()

# logging additional metadata after training
neptune_callback.run["predictions"] = ...

Running multiple trainings for the same Trainer#

You can run multiple training sessions for the same Trainer object. However, each separate session will be logged to a new Neptune run.

For how to resume logging to an existing run, see Logging additional metadata after training.

training_args = TrainingArguments(
    "quick-training-distilbert-mrpc", 
    evaluation_strategy="steps",
    num_train_epochs = 2,
    report_to = "neptune",
)

trainer = Trainer(model, training_args)

# First training session
trainer.train()

# Option: logging auxiliary metadata to the 1st training session
trainer.log(my_metadata)

# Second training session
trainer.train()

Changing what the callback logs#

You can control what metadata is logged through the Neptune callback by passing some extra arguments.

training_args = TrainingArguments(
    ...,
    save_strategy="epoch",  # Sets the checkpoint save strategy
    report_to=None,
)

neptune_callback = NeptuneCallback(
    ...,
    tags=["args-callback", "thin"],
    capture_hardware_metrics=False,
    log_parameters=False,
    log_checkpoints=None,
)

trainer = Trainer(model, training_args, callbacks=[neptune_callback])
trainer.train()

API reference#

NeptuneCallback()#

Creates a Neptune callback that you pass to the callbacks argument of the Trainer constructor.

Parameters#

Name         Type Default     Description
api_token str, optional None Neptune API token obtained upon registration. You can leave this argument out if you have saved your token to the NEPTUNE_API_TOKEN environment variable (strongly recommended).
project str, optional None Name of an existing Neptune project, in the form: "workspace-name/project-name". You can find and copy the name from the project settingsProperties in Neptune.

If None, the value of the NEPTUNE_PROJECT environment variable will be used.

If you just want to try logging anonymously, you can use the public project "common/huggingface-integration".

name str, optional None Custom name for the run.
base_namespace str, optional "finetuning" In the Neptune run, the root namespace (folder) that will contain all of the logged metadata.
run Run, optional None Pass a Neptune run object if you want to continue logging to an existing run.

See Logging to existing object and Passing object between files.

log_parameters bool, optional True Log all Trainer arguments and model parameters provided by the Trainer.
log_checkpoints str, optional None
  • If "same", uploads checkpoints whenever they are saved by the Trainer.
  • If "last", uploads only the most recently saved checkpoint.
  • If "best", uploads the best checkpoint (among the ones saved by the Trainer).
  • If None, does not upload checkpoints.
**neptune_run_kwargs (optional) - Additional keyword arguments to be passed directly to the init_run() method when a new run is created.

Example#

from transformers.integrations import NeptuneCallback

# Create Neptune callback
neptune_callback = NeptuneCallback(
    name="DistilBERT",
    description="DistilBERT fine-tuned on GLUE/MRPC",
    tags=["args-callback", "fine-tune", "MRPC"],  # tags help you manage runs
    base_namespace="custom_name",  # the default is "finetuning"
    log_checkpoints="best",  # other options are "last", "same", and None
    capture_hardware_metrics=False,  # additional kwargs for a Neptune run
)

# Create training arguments
training_args = TrainingArguments(
    "quick-training-distilbert-mrpc",
    evaluation_strategy="steps",
    eval_steps=20,
    report_to=None,
)

# Pass Neptune callback to Trainer
trainer = Trainer(
    model,
    training_args,
    callbacks=[neptune_callback],
)

trainer.train()
Back to top