Track artifacts#
Instead of uploading entire files, you can track and version them in Neptune as artifacts.
With the track_files()
method, you can log metadata about datasets, models, and any other artifacts that can be stored as files.
An artifact can refer to a file as well as a collection of files. For instance, if you track a folder with files inside, Neptune logs the metadata of each individual file and the whole folder.
You can track the following for each artifact:
- The URL and file path
- The MD5 hash
- Size
- Last modified
About the MD5 hash
The hash of the artifact is calculated based on the file contents and metadata, such as the path, size, and last modification time. A change to any of these will result in a different hash, even if the file contents are exactly the same.
For details, see API reference ≫ Artifact
.
Example#
-
Pass the path to a file or folder as an argument to the
track_files()
method: -
Navigate to the Run details view, select Artifacts.
- Select artifacts to preview them and inspect the metadata.
Passing an absolute Windows file path#
The file path is expected to be a URI, such as file://c:/path/to/file
.
To correctly parse an absolute Windows path (C:\Path\to\file
):
- Work around the backslashes in one of the following ways:
- Escape any backslashes
- Convert the backslashes to forward slashes
- Convert the file path to a raw string
- Prepend
file://
to the path.
For example:
import neptune
run = neptune.init_run()
path1 = "C:\\Path\\to\\file"
path2 = "C:/Path/to/file"
path3 = r"C:\Path\to\file"
run["artifact1"].track_files(f"file://{path1}")
run["artifact2"].track_files(f"file://{path2}")
run["artifact3"].track_files(f"file://{path3}")
Tracking artifacts from S3-compatible storage#
You can version datasets or models stored on Amazon S3 or compatible storage (s3://...
), such as MinIO or Google Cloud Storage (GCS).
Amazon S3#
You need to store your credentials for Amazon Web Services (AWS) as environment variables.
For example, on Amazon S3, configure an IAM group policy with "S3ReadAccessOnly" permissions.
Then, export the user access keys:
Where to enter the command
- Linux: Command line
- macOS: Terminal app
- Windows: PowerShell or Command Prompt
- Jupyter Notebook: In a cell, prefixed with an exclamation mark:
! your-command-here
For more information, see the AWS documentation:
Google Cloud Storage#
For GCS, you need to set the storage endpoint URL (https://storage.googleapis.com
) to an environment variable named S3_ENDPOINT_URL
.
Also set your GCS credentials to the following environment variables:
export AWS_ACCESS_KEY_ID='Your_GCS_service_account_key_here'
export AWS_SECRET_ACCESS_KEY='Your_GCS_service_account_secret_here'
To find your information:
- On the Google Cloud console, go to the Cloud Storage Buckets page.
- Navigate to Settings → Interoperability.
- The Storage URI is the value you need for the
S3_ENDPOINT_URL
environment variable. - Check the HMAC key identifiers:
- The access key is the value for
AWS_ACCESS_KEY_ID
. - The secret is the value for
AWS_SECRET_ACCESS_KEY
.
- The access key is the value for
For details, see the Google Cloud docs .
When specifying the URL to the GCS asset to track with Neptune, use the S3 protocol:
Other providers#
To access other S3-compatible storage providers, you need to set the storage endpoint URL to an environment variable named S3_ENDPOINT_URL
.
Example#
Once you've set up your credentials (and possibly endpoint), pass the S3 path to the track_files()
method:
Querying artifact metadata#
For how to download artifact metadata via API, see Download artifact metadata.
Related
- Use cases ≫ Data versioning tutorial
- API ≫ Field types: Artifact