Track artifacts#
Instead of uploading entire files, you can track and version them in Neptune as artifacts. With the track_files()
method, you can log metadata about datasets, models, and any other artifacts that can be stored as files.
An artifact can refer to a file as well as a collection of files. For instance, if you track a folder with files inside, Neptune logs the metadata of each individual file and the whole folder.
You can track the following for each artifact:
- The URL and file path
- The MD5 hash
- Size
- Last modified
About the MD5 hash
The hash of the artifact is calculated based on the file contents and metadata, such as the path, size, and last modification time. A change to any of these will result in a different hash, even if the file contents are exactly the same.
For details, see API reference ≫ Artifact
.
Example#
-
Pass the path to a file or folder as an argument to the
track_files()
method: -
In the Neptune web app, open the run and navigate to the Artifacts tab.
- Select artifacts to preview them and inspect the metadata.
Passing an absolute Windows file path#
The file path is expected to be a URI, such as file://c:/path/to/file
.
To correctly parse an absolute Windows path (C:\Path\to\file
):
- Work around the backslashes in one of the following ways:
- Escape any backslashes
- Convert the backslashes to forward slashes
- Convert the file path to a raw string
- Prepend
file://
to the path.
For example:
import neptune
run = neptune.init_run()
path1 = "C:\\Path\\to\\file"
path2 = "C:/Path/to/file"
path3 = r"C:\Path\to\file"
run["artifact1"].track_files(f"file://{path1}")
run["artifact2"].track_files(f"file://{path2}")
run["artifact3"].track_files(f"file://{path3}")
Tracking artifacts from S3-compatible storage#
You can version datasets or models stored on Amazon S3 or compatible storage (s3://...
), such as MinIO or Google Cloud Storage (GCS).
Amazon S3#
You need to store your credentials for Amazon Web Services (AWS) as environment variables.
For example, on Amazon S3, configure an IAM group policy with "S3ReadAccessOnly" permissions.
Then, export the user access keys:
Where to enter the command
- Linux: Command line
- macOS: Terminal app
- Windows: PowerShell or Command Prompt
- Jupyter Notebook: In a cell, prefixed with an exclamation mark:
! your-command-here
For more information, see the AWS documentation:
Google Cloud Storage#
For GCS, you need to set the storage endpoint URL (https://storage.googleapis.com
) to an environment variable named S3_ENDPOINT_URL
.
Also set your GCS credentials to the following environment variables:
export AWS_ACCESS_KEY_ID='Your_GCS_service_account_key_here'
export AWS_SECRET_ACCESS_KEY='Your_GCS_service_account_secret_here'
To find your information:
- On the Google Cloud console, go to the Cloud Storage Buckets page.
- Navigate to Settings → Interoperability.
- The Storage URI is the value you need for the
S3_ENDPOINT_URL
environment variable. - Check the HMAC key identifiers:
- The access key is the value for
AWS_ACCESS_KEY_ID
. - The secret is the value for
AWS_SECRET_ACCESS_KEY
.
- The access key is the value for
For details, see the Google Cloud docs .
When specifying the URL to the GCS asset to track with Neptune, use the S3 protocol:
Other providers#
To access other S3-compatible storage providers, you need to set the storage endpoint URL to an environment variable named S3_ENDPOINT_URL
.
Example#
Once you've set up your credentials (and possibly endpoint), pass the S3 path to the track_files()
method:
Logging a custom hash#
Apart from the default information tracked with the track_files()
method, you can log additional metadata for your artifact.
For example, to log a custom hash, use:
run["train/dataset"].track_files("./datasets/train.csv")
run["train/latest_custom_hash"] = "custom hash"
If you log the custom hash to the same namespace as the artifact, the MD5 hash and the custom hash appear together in the All metadata tab in the Neptune app:
You can also include the logged metadata in custom dashboards.
Querying artifact metadata#
For how to download artifact metadata via API, see Download artifact metadata.
Related
- Tutorials ≫ Data versioning
- API reference ≫ Field types:
Artifact