Skip to content

Download artifact metadata#

When you use the track_files() method, it results in an Artifact field that contains metadata about the tracked files. The field can reference a single file as well as a collection of files.

This guide shows how you can fetch metadata from the artifact field.

Assumptions

In this guide, we assume the following file structure:

.
|-- datasets/
    |-- train/
        |-- sample.csv
        |-- ...

We log the datasets/ folder under the field "data_versions".

>>> import neptune
>>> run = neptune.init_run()  # creates a run with the example identifier "CLS-45"
>>> run["data_versions"].track_files("datasets/")
>>> run.stop()

Later, we can connect to the run by passing its Neptune ID at initialization:

>>> run = neptune.init_run(with_id="CLS-45", mode="read-only")
https://app.neptune.ai/ml-team/classification/e/CLS-45
How do I find the ID?

The Neptune ID is a unique identifier for the object. In the table view, it's displayed in the leftmost column.

The ID is stored in the system namespace. If the object is active, you can obtain its ID with neptune_object["sys/id"].fetch(). For example:

>>> run = neptune.init_run(project="ml-team/classification")
>>> run["sys/id"].fetch()
'CLS-26'
If Neptune can't find your project name or API token

As a best practice, you should save your Neptune API token and project name as environment variables:

export NEPTUNE_API_TOKEN="h0dHBzOi8aHR0cHM6Lkc78ghs74kl0jvYh3Kb8"
export NEPTUNE_PROJECT="ml-team/classification"

You can, however, also pass them as arguments when initializing Neptune:

run = neptune.init_run(
    api_token="h0dHBzOi8aHR0cHM6Lkc78ghs74kl0jvYh3Kb8",  # your token here
    project="ml-team/classification",  # your full project name here
)
  • API token: In the bottom-left corner, expand the user menu and select Get my API token.
  • Project name: in the top-right menu: Edit project details.

If you haven't registered, you can also log anonymously to a public project (make sure not to publish sensitive data through your code!):

run = neptune.init_run(
    api_token=neptune.ANONYMOUS_API_TOKEN,
    project="common/quickstarts",
)

Fetching the artifact hash#

To obtain the hash of the artifact, use the fetch_hash() method on the artifact field:

>>> import neptune
>>> run = neptune.init_run(with_id="CLS-45", mode="read-only")
https://app.neptune.ai/ml-team/classification/e/CLS-45
>>> run["data_versions"].fetch_hash() 
'4e2f79947dfc5ca977c507f905792fae98c49a4b1df795d81e80279e3ce7be8c'

Fetching metadata of contained files#

You can fetch the metadata of files inside an artifact with the fetch_files_list() method. This returns an ArtifactFileData object with the following properties:

  • file_hash: Hash of the file.
  • file_path: Path of the file, relative to the root of the virtual artifact directory.
  • size: Size of the file, in kilobytes.
  • metadata: Dictionary with the keys:
    • file_path: URL of the file (absolute path in local or S3-compatible storage).
    • last_modified: When the file was last modified.

The below example shows how you can interact with the ArtifactFileData object.

>>> import neptune
>>> run = neptune.init_run(with_id="CLS-45", mode="read-only")
https://app.neptune.ai/ml-team/classification/e/CLS-45
>>> artifact_list = run["data_versions"].fetch_files_list()

You can now access metadata through artifact_list:

>>> artifact_list[0].file_hash
'e54fdfced68d7e057eda168a05910fe609fc27f5'
>>> artifact_list[0].file_path
'train/sample.csv'

The metadata field of an individual file is a dictionary with the following keys: "file_path" (path of the file, either on local storage or S3-compatible storage) and "last_modified".

>>> artifact_list[0].metadata["last_modified"]
'2022-09-30 10:50:40'
>>> artifact_list[0].metadata["file_path"]
'file:///home/jackie/projects/text-classification/datasets/train/sample.csv'

Downloading contained files#

You can also download all the files that are referenced in the artifact field with the download() method.

Neptune looks for each file at the path which was logged originally.

Note for Windows
  • This method creates symbolic links to the referenced files.
  • You may need to run your terminal program as administrator, to grant the client the permissions needed to copy the file references in your local system.
>>> import neptune
>>> run = neptune.init_run(with_id="CLS-45", mode="read-only")
https://app.neptune.ai/ml-team/classification/e/CLS-45
>>> run["data_versions"].download(destination="downloaded_artifact")

If the artifact points to an object stored in S3 or GCS, it downloads the object to the local system directly from the remote storage.


Related