Skip to content

MosaicML Composer integration guide#

Open in Colab

Custom dashboard displaying metadata logged with MosaicML Composer

MosaicML Composer is a PyTorch library for efficient neural network training. With the Neptune-Composer integration, you can automatically log your Composer training metadata to Neptune.

This guide shows how to:

  • Create a Neptune logger, to automatically track metadata.
  • Access the Neptune run from within the logger, to log metadata manually.

See in Neptune  Code examples 

Quickstart#

  1. Set up Neptune in your environment.

    Show steps
    1. Sign up at neptune.ai/register .

    2. Create a project.

    3. Install the Neptune client library:

      pip install -U neptune
      
    4. Set your Neptune API token to the NEPTUNE_API_TOKEN environment variable.

      How to find your Neptune API token

      export NEPTUNE_API_TOKEN="uyVrZXkiOiIzNTd0Zj...ifQ=="
      
      setx NEPTUNE_API_TOKEN "uyVrZXkiOiIzNTd0Zj...ifQ=="
      
      %env NEPTUNE_API_TOKEN="uyVrZXkiOiIzNTd0Zj...ifQ=="
      
    5. Set the name of your Neptune project to the NEPTUNE_PROJECT environment variable.

      You can copy the full name from your project details.

      How to access project details

      export NEPTUNE_PROJECT="workspace-name/project-name"
      
      setx NEPTUNE_PROJECT "workspace-name/project-name"
      
      %env NEPTUNE_PROJECT="workspace-name/project-name"
      

    For detailed instructions, see Getting started.

  2. Create the logger:

    from composer import Trainer
    from composer.loggers import NeptuneLogger
    
    neptune_logger = NeptuneLogger()
    
  3. Pass the logger to the loggers argument of the trainer:

    trainer = Trainer(
        ...
        loggers=neptune_logger,
    )
    

Success

Neptune logging will be enabled when you run your Trainer with trainer.fit().

Full walkthrough#

This guide walks you through connecting Neptune to your machine learning scripts and analyzing some logged metadata.

Before you start#

  • Sign up at neptune.ai/register.
  • Create a project for storing your metadata.
  • Set your Neptune credentials (API token and target project name).

    export NEPTUNE_API_TOKEN="h0dHBzOi8aHR0cHM.4kl0jvYh3Kb8...ifQ=="
    
    export NEPTUNE_PROJECT="ml-team/classification"
    
    export NEPTUNE_API_TOKEN="h0dHBzOi8aHR0cHM.4kl0jvYh3Kb8...ifQ=="
    
    export NEPTUNE_PROJECT="ml-team/classification"
    
    setx NEPTUNE_API_TOKEN "h0dHBzOi8aHR0cHM.4kl0jvYh3Kb8...ifQ=="
    
    setx NEPTUNE_PROJECT "ml-team/classification"
    

    You can also navigate to SettingsEdit the system environment variables and add the variables there.

    %env NEPTUNE_API_TOKEN="h0dHBzOi8aHR0cHM.4kl0jvYh3Kb8...ifQ=="
    
    %env NEPTUNE_PROJECT="ml-team/classification"
    
    How do I find my Neptune credentials?
    • API token: In the bottom-left corner of the Neptune app, open your user menu and select Get your API token. If you need the token of a service account, go to the workspace or project settings and select Service accounts.
    • Project name: Your full project name has the form workspace-name/project-name. You can copy it from the project menu ( Details & privacy).
      • You can also find a pre-filled project string in Experiments Create a new run.
  • Install the Neptune client library and MosaicML Composer on your system.

    pip install neptune mosaicml
    

If you'd rather follow the guide without any setup, you can run the example in Colab .

Adding NeptuneLogger to the Composer script#

Composer has a unified way of logging metadata, by using loggers. You can learn more about logger support in the Composer docs .

To enable Neptune logging:

  1. Create a NeptuneLogger instance:

    from composer.loggers import NeptuneLogger
    
    neptune_logger = NeptuneLogger()  # (1)!
    
    1. If you haven't registered, you can try the integration anonymously:

      from neptune import ANONYMOUS_API_TOKEN
      
      neptune_logger = NeptuneLogger(
          project="common/mosaicml-composer",
          api_token=ANONYMOUS_API_TOKEN,
      )
      
  2. (optional) By default, the metadata captured by the logger goes into a namespace called training.

    To change this, pass another name to the base_namespace argument:

    neptune_logger = NeptuneLogger(
        base_namespace="my-namespace-name",
    )
    
  3. (optional) By default, the Neptune logger is enabled only on the rank zero process.

    To log on all ranks, set the rank_zero_only argument to False:

    neptune_logger = NeptuneLogger(
        rank_zero_only=False,
    )
    
  4. (optional) By default, the logger does not upload checkpoints to Neptune. To change this:

    neptune_logger = NeptuneLogger(
        upload_checkpoints=True,
    )
    
  5. Pass the logger to the trainer:

    trainer = Trainer(
        ...,
        loggers=neptune_logger,
    )
    
  6. Run the trainer:

    trainer.fit()
    
  7. Run your script:

    python main.py
    
If Neptune can't find your project name or API token

As a best practice, you should save your Neptune API token and project name as environment variables:

export NEPTUNE_API_TOKEN="h0dHBzOi8aHR0cHM6Lkc78ghs74kl0jv...Yh3Kb8"
export NEPTUNE_PROJECT="ml-team/classification"

Alternatively, you can pass the information when using a function that takes api_token and project as arguments:

run = neptune.init_run(
    api_token="h0dHBzOi8aHR0cHM6Lkc78ghs74kl0jv...Yh3Kb8", # (1)!
    project="ml-team/classification", # (2)!
)
  1. In the bottom-left corner, expand the user menu and select Get my API token.
  2. You can copy the path from the project details ( Details & privacy).

If you haven't registered, you can log anonymously to a public project:

api_token=neptune.ANONYMOUS_API_TOKEN
project="common/quickstarts"

Make sure not to publish sensitive data through your code!

Once the Neptune logger is created, a link appears in the console output.

Click the link to open the run in Neptune. You'll see the metadata appear as it gets logged.

Analyzing the logged metadata in Neptune#

To browse the metadata:

  1. Follow the Neptune link in the console output.

    The All metadata tab opens.

  2. Click training (or the name of your custom namespace, if you specified it when creating the logger).

To visualize and display the metadata, you can Create a dashboard with customizable widgets.

Comparing runs against each other#

Toggle the eye icons to select runs for comparison.

You can also create comparison-specific widgets, such as scatter plots.

More options#

Passing Neptune keyword arguments#

The Neptune logger accepts neptune.init_run() arguments. You can use them to supply more details, customize the behavior, or disable auto-logging that is enabled by default.

Example
from composer.loggers import NeptuneLogger

neptune_logger = NeptuneLogger(
    project="ml-team/nli-project",
    name="radiant-swallow", # (1)!
    description="Quick test run with MosaicML Composer",
    tags=["composer", "test"],
    source_files="*.py",
    dependencies="infer",
    git_ref=False,
)
  1. Sets a custom name, which you can use as a human-friendly ID.

    To display it in the app, add sys/name as a column.

    You can also edit the name in the run information view ( Run information).

Show init_run() parameters list

See in API reference: neptune.init_run()

Name      Type Default     Description
project str, optional None Name of a project in the form workspace-name/project-name. If None, the value of the NEPTUNE_PROJECT environment variable is used.
api_token str, optional None Your Neptune API token (or a service account's API token). If None, the value of the NEPTUNE_API_TOKEN environment variable is used.

To keep your token secure, avoid placing it in source code. Instead, save it as an environment variable.

with_id str, optional None The Neptune identifier of an existing run to resume, such as "CLS-11". The identifier is stored in the object's sys/id field. If omitted or None is passed, a new tracked run is created.
custom_run_id str, optional None A unique identifier that can be used to log metadata to a single run from multiple locations. Max length: 36 characters. If None and the NEPTUNE_CUSTOM_RUN_ID environment variable is set, Neptune will use that as the custom_run_id value. For details, see Set custom run ID.
mode str, optional async Connection mode in which the logging will work. Possible values are async, sync, offline, read-only, and debug.

If you leave it out, the value of the NEPTUNE_MODE environment variable is used. If that's not set, the default async is used.

name str, optional Neptune ID Custom name for the run. You can use it as a human-readable ID and add it as a column in the experiments table (sys/name). If left empty, once the run is synchronized with the server, Neptune sets the auto-generated identifier (sys/id) as the name.
description str, optional "" Editable description of the run. You can add it as a column in the experiments table (sys/description).
tags list, optional [] Must be a list of str which represent the tags for the run. You can edit them after run is created, either in the run information or experiments table.
source_files list or str, optional None

List of source files to be uploaded. Must be list of str or a single str. Uploaded sources are displayed in the Source code section of the run.

If None is passed, the Python file from which the run was created will be uploaded. When resuming a run, no file will be uploaded by default. Pass an empty list ([]) to upload no files.

Unix style pathname pattern expansion is supported. For example, you can pass ".py" to upload all Python source files from the current directory. Paths of uploaded files are resolved relative to the calculated common root of all uploaded source files. For recursion lookup, use "**/.py" (for Python 3.5 and later). For details, see the glob library.

capture_stdout Boolean, optional True Whether to log the standard output stream. Is logged in the monitoring namespace.
capture_stderr Boolean, optional True Whether to log the standard error stream. Is logged in the monitoring namespace.
capture_hardware_metrics Boolean, optional True Whether to track hardware consumption (CPU, GPU, memory utilization). Logged in the monitoring namespace.
fail_on_exception Boolean, optional True If an uncaught exception occurs, whether to set run's Failed state to True.
monitoring_namespace str, optional "monitoring" Namespace inside which all monitoring logs will be stored.
flush_period float, optional 5 (seconds) In asynchronous (default) connection mode, how often Neptune should trigger disk flushing.
proxies dict, optional None Argument passed to HTTP calls made via the Requests library. For details on proxies, see the Requests documentation.
capture_traceback Boolean, optional True In case of an exception, whether to log the traceback of the run.
git_ref GitRef or Boolean None GitRef object containing information about the Git repository path.

If None, Neptune looks for a repository in the path of the script that is executed.

To specify a different location, set to GitRef(repository_path="path/to/repo").

To turn off Git tracking for the run, set to GitRef.DISABLED or False.

For examples, see Logging Git info.
dependencies str, optional None Tracks environment requirements. If you pass "infer" to this argument, Neptune logs dependencies installed in the current environment. You can also pass a path to your dependency file directly. If left empty, no dependency file is uploaded.
async_lag_callback NeptuneObjectCallback, optional None Custom callback function which is called if the lag between a queued operation and its synchronization with the server exceeds the duration defined by async_lag_threshold. The callback should take a Run object as the argument and can contain any custom code, such as calling stop() on the object.

Note: Instead of using this argument, you can use Neptune's default callback by setting the NEPTUNE_ENABLE_DEFAULT_ASYNC_LAG_CALLBACK environment variable to TRUE.

async_lag_threshold float, optional 1800.0 (seconds) Duration between the queueing and synchronization of an operation. If a lag callback (default callback enabled via environment variable or custom callback passed to the async_lag_callback argument) is enabled, the callback is called when this duration is exceeded.
async_no_progress_callback NeptuneObjectCallback, optional None Custom callback function which is called if there has been no synchronization progress whatsoever for the duration defined by async_no_progress_threshold. The callback should take a Run object as the argument and can contain any custom code, such as calling stop() on the object.

Note: Instead of using this argument, you can use Neptune's default callback by setting the NEPTUNE_ENABLE_DEFAULT_ASYNC_NO_PROGRESS_CALLBACK environment variable to TRUE.

async_no_progress_threshold float, optional 300.0 (seconds) For how long there has been no synchronization progress. If a no-progress callback (default callback enabled via environment variable or custom callback passed to the async_no_progress_callback argument) is enabled, the callback is called when this duration is exceeded.

Logging after fitting is finished#

You can access the Neptune run outside of the Trainer context, which lets you log metadata after the fitting is finished.

To track more metadata under the logger namespace (base_namespace), use the .base_handler property to access a namespace handler:

metadata = ...
neptune_logger.base_handler["your/metadata/structure"] = metadata

You can operate on this handler just like on a run object. Apart from simple assignment (=), you can use any logging methods from the Neptune client library to track your metadata, such as append(), assign() (=), upload(), or track_files().

Accessing the entire Run object

You can also access the Neptune run object directly with the .neptune_run property. This can be handy if you want to use a custom namespace structure or you want to operate directly on the Run instance.

Generic recipe:

metadata = ...
neptune_logger.neptune_run["your/metadata/structure"] = metadata

# Use a Run method
neptune_logger.neptune_run.print_structure()

Example: Logging dataset samples#

Assume you have some datasets prepared:

from torch.utils.data import DataLoader
from torchvision import datasets, transforms


train_dataset = datasets.MNIST(
    "data", download=True, train=True, transform=transform
)
eval_dataset = datasets.MNIST(
    "data", download=True, train=False, transform=transform
)
...

trainer.fit()

You could upload samples of the train and eval datasets as follows:

from neptune.types import File

neptune_logger.neptune_run["images/training"].extend(
    [File.as_image(img) for img in train_dataset.data[:50]]
)
neptune_logger.neptune_run["images/eval"].extend(
    [File.as_image(img) for img in eval_dataset.data[:50]]
)

The above would log the samples as series of images under the images namespace of the run.


Related