Skip to content

🤗 Transformers integration guide#

Open in Colab

Custom dashboard displaying metadata logged with Transformers

🤗 Transformers by Hugging Face is a popular framework for model training in the natural language processing domain. With Neptune, you can log, store, display, and compare your model-building metadata.

See example in Neptune  Code examples 

Before you start#

  • Sign up at neptune.ai/register.
  • Create a project for storing your metadata.

  • Have Neptune and 🤗 Transformers installed:

    pip install -U neptune transformers
    
    conda install -c conda-forge neptune transformers
    

To see how the integration works without setting up your environment, run the example in Colab .

Setting up the integration#

You can integrate the metadata tracking with Neptune in two ways:

  • By passing report_to="neptune" to the Trainer arguments.
  • By configuring a Neptune callback.

Use Trainer arguments#

Use report_to="neptune" to set up a basic integration with Neptune and log only the default metadata. You don't have to initialize a Neptune run in your code, as the integration does it for you.

To integrate through Trainer arguments:

  1. Save your Neptune API token and full project name as environment variables.

    How do I save my credentials as environment variables?

    Set your Neptune API token and full project name to the NEPTUNE_API_TOKEN and NEPTUNE_PROJECT environment variables, respectively.

    export NEPTUNE_API_TOKEN="h0dHBzOi8aHR0cHM.4kl0jvYh3Kb8...ifQ=="
    
    export NEPTUNE_PROJECT="ml-team/classification"
    
    export NEPTUNE_API_TOKEN="h0dHBzOi8aHR0cHM.4kl0jvYh3Kb8...ifQ=="
    
    export NEPTUNE_PROJECT="ml-team/classification"
    
    setx NEPTUNE_API_TOKEN "h0dHBzOi8aHR0cHM.4kl0jvYh3Kb8...ifQ=="
    
    setx NEPTUNE_PROJECT "ml-team/classification"
    

    You can also navigate to SettingsEdit the system environment variables and add the variables there.

    %env NEPTUNE_API_TOKEN="h0dHBzOi8aHR0cHM.4kl0jvYh3Kb8...ifQ=="
    
    %env NEPTUNE_PROJECT="ml-team/classification"
    

    To find your credentials:

    • API token: In the bottom-left corner of the Neptune app, expand your user menu and select Get your API token. If you need the token of a service account, go to the workspace or project settings and enter the Service accounts settings.
    • Project name: Your full project name has the form workspace-name/project-name. You can copy it from the project menu ( Details & privacy).

    If you're working in Google Colab, you can set your credentials with the os and getpass libraries:

    import os
    from getpass import getpass
    os.environ["NEPTUNE_API_TOKEN"] = getpass("Enter your Neptune API token: ")
    os.environ["NEPTUNE_PROJECT"] = "workspace-name/project-name"
    
  2. In your model training script, import Neptune:

    import neptune
    
  3. In TrainingArguments, set the report_to argument to "neptune":

    training_args = TrainingArguments(
        "quick-training-distilbert-mrpc",
        evaluation_strategy="steps",
        save_strategy="epoch",
        eval_steps=20,
        report_to="neptune",
        ...,
    )
    
    trainer = Trainer(
        model,
        training_args,
        ...,
    )
    
  4. Start the training with trainer.train().

Use the Neptune app to watch the training live as well as visualize, sort, and organize your logged metadata. For details, see Runs in Neptune.

Use a Neptune callback#

To have more control over the metadata logging, use a Neptune callback. With this approach, you can decide what kind of metadata to log, use custom namespaces, or log to an existing run.

To configure a Neptune callback:

  1. In your model training script, import Neptune and Neptune callback:

    import neptune
    from transformers.integrations import NeptuneCallback
    
  2. In TrainingArguments, set the report_to argument to "none".

    training_args = TrainingArguments(
        ..., 
        report_to="none" # (1)!
    )
    
    1. To avoid creating several callbacks, set the report_to argument to "none".
  3. Initialize a Neptune callback and specify where to log the metadata:

    neptune_callback = NeptuneCallback(
        project="your-workspace-name/your-project-name", # (1)!
        api_token="YourNeptuneApiToken", # (2)!
    )
    
    1. The full project name. For example, "ml-team/classification".

      • You can copy the name from the project details ( Details & privacy)
      • You can also find a pre-filled project string in Experiments Create a new run.
    2. In the bottom-left corner of the Neptune app, expand the user menu and select Get your API token.

      When you're done testing, save your API token as an environment variable instead of putting it here in the code!

  4. (Optional) To cutomize the way your run is tracked, pass extra arguments to NeptuneCallback:

    neptune_callback = NeptuneCallback(
        ...,
        tags=["args-callback", "thin"],
        capture_hardware_metrics=False,
        log_parameters=False,
        log_checkpoints=None,
    )
    

    For a full list of options, see the API reference.

  5. Pass the Neptune callback to the Trainer and start the training with trainer.train():

    trainer = Trainer(
        model,
        training_args,
        ...,
        callbacks=[neptune_callback]
    )
    
    trainer.train()
    

Use the Neptune app to watch the training live as well as visualize, sort, and organize your logged metadata. For details, see Runs in Neptune.

More options#

Passing a Neptune run to the callback#

If you initialize a Neptune run first, you can pass it to the callback:

from transformers.integrations import NeptuneCallback
import neptune

training_args = TrainingArguments(..., report_to="none")

run = neptune.init_run(
    name="distilbert-mrpc", description="DistilBERT fine-tuned on GLUE/MRPC"
)

neptune_callback = NeptuneCallback(run=run)
trainer = Trainer(model, training_args, callbacks=[neptune_callback])
trainer.train()
Note for Jupyter Notebook

Normally in notebooks and other interactive environments, you need to manually stop the Neptune run object with run.stop(). With this integration, the Neptune run is stopped automatically when trainer.train() finishes, so you don't have to call run.stop().

Related

API referenceinit_run()

Logging additional metadata after training#

A Neptune run stops automatically when trainer.train() finishes. If you wish to continue logging to the run, you have two options:

Option 1: Use the trainer method#

You can log additional metadata with trainer.log():

trainer.train()

# logging additional metadata after training
trainer.log(extra_metadata)

Option 2: Get the run object and use the Neptune API#

If you use a Neptune callback, you can resume the run with the get_run() method:

trainer.train()

# logging additional metadata after training
run = NeptuneCallback.get_run(trainer)
run["predictions"] = ...

Alternatively, if you have a NeptuneCallback instance available, you can access its .run property:

trainer.train()

# logging additional metadata after training
neptune_callback.run["predictions"] = ...

Running multiple trainings for the same Trainer#

You can run multiple training sessions for the same Trainer object. However, each session is logged to a new Neptune run. For how to resume logging to an existing run, see Logging additional metadata after training.

training_args = TrainingArguments(
    "quick-training-distilbert-mrpc",
    evaluation_strategy="steps",
    num_train_epochs=2,
    report_to="neptune",
)

trainer = Trainer(model, training_args)

# First training session
trainer.train()

# Option: logging additional metadata to the first training session
trainer.log(my_metadata)

# Second training session
trainer.train()