Track artifacts#

Previewing artifact metadata in Neptune

Instead of uploading entire files, you can track and version them in Neptune as artifacts. With the track_files() method, you can log metadata about datasets, models, and any other artifacts that can be stored as files.

An artifact can refer to a file as well as a collection of files. For instance, if you track a folder with files inside, Neptune logs the metadata of each individual file and the whole folder.

You can track the following for each artifact:

The URL and file path
The MD5 hash
Size
Last modified

About the MD5 hash

The hash of the artifact is calculated based on the file contents and metadata, such as the path, size, and last modification time. A change to any of these will result in a different hash, even if the file contents are exactly the same.

For details, see API reference ≫ Artifact.

Example#

Pass the path to a file or folder as an argument to the track_files() method:

Single file

run["train/dataset"].track_files("./datasets/train.csv")

Folder

run["train/images"].track_files("./datasets/images")

In the Neptune web app, open the run and navigate to the Artifacts tab.
Select artifacts to preview them and inspect the metadata.

See example in Neptune

Passing an absolute Windows file path#

The file path is expected to be a URI, such as file://c:/path/to/file.

To correctly parse an absolute Windows path (C:\Path\to\file):

Work around the backslashes in one of the following ways:
1. Escape any backslashes
2. Convert the backslashes to forward slashes
3. Convert the file path to a raw string
Prepend file:// to the path.

For example:

import neptune

run = neptune.init_run()

path1 = "C:\\Path\\to\\file"
path2 = "C:/Path/to/file"
path3 = r"C:\Path\to\file"

run["artifact1"].track_files(f"file://{path1}")
run["artifact2"].track_files(f"file://{path2}")
run["artifact3"].track_files(f"file://{path3}")

Tracking artifacts from S3-compatible storage#

You can version datasets or models stored on Amazon S3 or compatible storage (s3://...), such as MinIO or Google Cloud Storage (GCS).

Amazon S3#

You need to store your credentials for Amazon Web Services (AWS) as environment variables.

For example, on Amazon S3, configure an IAM group policy with "S3ReadAccessOnly" permissions.

Then, export the user access keys:

Linux macOS Windows

export AWS_SECRET_ACCESS_KEY='Your_AWS_key_here'
export AWS_ACCESS_KEY_ID='Your_AWS_ID_here'

export AWS_SECRET_ACCESS_KEY='Your_AWS_key_here'
export AWS_ACCESS_KEY_ID='Your_AWS_ID_here'

setx AWS_SECRET_ACCESS_KEY 'Your_AWS_key_here'
setx AWS_ACCESS_KEY_ID 'Your_AWS_ID_here'

Where to enter the command

Linux: Command line
macOS: Terminal app
Windows: PowerShell or Command Prompt
Jupyter Notebook: In a cell, prefixed with an exclamation mark: ! your-command-here

For more information, see the AWS documentation:

Google Cloud Storage#

For GCS, you need to set the storage endpoint URL (https://storage.googleapis.com) to an environment variable named S3_ENDPOINT_URL.

Linux macOS Windows

export S3_ENDPOINT_URL='https://storage.googleapis.com'

export S3_ENDPOINT_URL='https://storage.googleapis.com'

setx S3_ENDPOINT_URL 'https://storage.googleapis.com'

To set permanently:

setx S3_ENDPOINT_URL 'https://storage.googleapis.com'

Also set your GCS credentials to the following environment variables:

export AWS_ACCESS_KEY_ID='Your_GCS_service_account_key_here'
export AWS_SECRET_ACCESS_KEY='Your_GCS_service_account_secret_here'

To find your information:

On the Google Cloud console, go to the Cloud Storage Buckets page.
Navigate to Settings → Interoperability.
The Storage URI is the value you need for the S3_ENDPOINT_URL environment variable.
Check the HMAC key identifiers:
- The access key is the value for AWS_ACCESS_KEY_ID.
- The secret is the value for AWS_SECRET_ACCESS_KEY.

For details, see the Google Cloud docs .

When specifying the URL to the GCS asset to track with Neptune, use the S3 protocol:

run["asset"].track_files("s3://path/to/asset")

Other providers#

To access other S3-compatible storage providers, you need to set the storage endpoint URL to an environment variable named S3_ENDPOINT_URL.

export S3_ENDPOINT_URL='https://your/storage/endpoint.com'

Example#

Once you've set up your credentials (and possibly endpoint), pass the S3 path to the track_files() method:

Single file

run["train_dataset"].track_files("s3://datasets/train.csv")

Folder

run["train/images"].track_files("s3://datasets/images")

See example in Neptune

Querying artifact metadata#

For how to download artifact metadata via API, see Download artifact metadata.

Tutorials ≫ Data versioning
API reference ≫ Field types: Artifact