Great Expectations OSS integration guide#

Great Expectations OSS (GX OSS) is an open-source tool for validating, documenting, and monitoring your data. With Neptune, you can:

  • Log GX OSS's configurations
  • Log validation results and display them in the Neptune app
  • Upload GX OSS's rich HTML reports and interact with them in the Neptune app

GX OSS metadata, validation results, and HTML reports visualized in the Neptune app.

Before you start#

To log metadata to Neptune, create a run object in your script. The run objects contain the metadata that you want to track, plus the automatically logged system metrics.

To log metadata to Neptune:

  1. Save your Neptune API token and full project name as environment variables.

    How do I save my credentials as environment variables?

    Set your Neptune API token and full project name to the NEPTUNE_API_TOKEN and NEPTUNE_PROJECT environment variables, respectively.

    export NEPTUNE_API_TOKEN="h0dHBzOi8aHR0cHM.4kl0jvYh3Kb8...ifQ=="
    export NEPTUNE_PROJECT="ml-team/classification"
    export NEPTUNE_API_TOKEN="h0dHBzOi8aHR0cHM.4kl0jvYh3Kb8...ifQ=="
    export NEPTUNE_PROJECT="ml-team/classification"
    setx NEPTUNE_API_TOKEN "h0dHBzOi8aHR0cHM.4kl0jvYh3Kb8...ifQ=="
    setx NEPTUNE_PROJECT "ml-team/classification"

    You can also navigate to SettingsEdit the system environment variables and add the variables there.

    %env NEPTUNE_API_TOKEN="h0dHBzOi8aHR0cHM.4kl0jvYh3Kb8...ifQ=="
    %env NEPTUNE_PROJECT="ml-team/classification"

    To find your credentials:

    • API token: In the bottom-left corner of the Neptune app, expand your user menu and select Get your API token. If you need the token of a service account, go to the workspace or project settings and enter the Service accounts settings.
    • Project name: Your full project name has the form workspace-name/project-name. You can copy it from the project menu ( Details & privacy).

    If you're working in Google Colab, you can set your credentials with the os and getpass libraries:

    import os
    from getpass import getpass
    os.environ["NEPTUNE_API_TOKEN"] = getpass("Enter your Neptune API token: ")
    os.environ["NEPTUNE_PROJECT"] = "workspace-name/project-name"
  2. In your script, import Neptune:

    import neptune
  3. Initialize a Neptune run:

    run = neptune.init_run()

    You can specify additional run parameters, such as tags or a description. For a full list of options, see the API reference.

  4. Log the GX OSS metadata. For example, to log a Data Context configuration under the gx/context/config namespace of a run, use:

    run["gx/context/config"] = context.get_config().to_json_dict()

    For details, see Logging examples.

  5. To stop the connection to Neptune and sync all data, call the stop() method:

  6. Run your script.

    To open the run and watch the logging live, click the Neptune link that appears in the console output.

    Example link:

Use the Neptune app to visualize, compare, and organize your logged metadata. For details, see Experiments.

Logging examples#

You can organize the logged metadata into a folder-like structure with the namespaces and fields of a run object. For details, see Namespaces and fields.


To view the logging examples in the Neptune app, check the GX metadata dashboard.

Log a Data Context configuration#

To log a Data Context configuration under the gx/context/config namespace of a run, use:

run["gx/context/config"] = context.get_config().to_json_dict()

Log a Checkpoint configuration#

The Checkpoint configuration contains unsupported values such as lists. To convert all the lists to strings, use the stringify_unsupported() method.

To log a Checkpoint config under the gx/checkpoint/config namespace of a run, use:

from neptune.utils import stringify_unsupported

run["gx/checkpoint/config"] = stringify_unsupported(checkpoint.config.to_json_dict())

Log Expectations#

To log and organize your Expectations, use:

expectation_suite = validator.get_expectation_suite().to_json_dict()

run["gx/meta"] = expectation_suite["meta"]

# Log the Expectation Suite name to the `gx/expectations/expectations_suite_name` 
# field of a run:
run["gx/expectations/expectations_suite_name"] = expectation_suite[

# Create a numbered folder for each Expectation in the `gx/expectations` namespace:
for idx, expectation in enumerate(expectation_suite["expectations"]):
    run["gx/expectations"][idx] = expectation

Log validation results#

By saving validation results as a dictionary, you can access them programmatically and use in your CI/CD pipelines:

results_dict = checkpoint_result.list_validation_results()[0].to_json_dict()

run["gx/validations/json"] = results_dict

for idx, result in enumerate(results_dict["results"]):
    run["gx/validations/json/results"][idx] = result

Upload HTML reports#

You can also upload the rich HTML reports to Neptune and then interact with them in the app:

  1. HTML reports are available only for FileDataContext. Start by converting the context to FileDataContext:

    from great_expectations.data_context import EphemeralDataContext
    import os
    if isinstance(context, EphemeralDataContext):
        context = context.convert_to_file_context()
  2. Fetch the local_site_path of a Data Context:

    local_site_path = os.path.dirname(context.build_data_docs()["local_site"])[7:]
  3. To log the HTML Expectations and Validation reports, use:

        os.path.join(local_site_path, "expectations")
        local_site_path, "validations")
  4. To view the uploaded reports in the Neptune app, navigate to All metadata, or create a custom dashboard with a File preview widget.