Skip to content

Great Expectations OSS integration guide#

Open in Colab

Great Expectations OSS (GX OSS) is an open-source tool for validating, documenting, and monitoring your data. With Neptune, you can:

  • Log GX OSS's configurations
  • Log validation results and display them in the Neptune app
  • Upload GX OSS's rich HTML reports and interact with them in the Neptune app

GX OSS metadata, validation results, and HTML reports visualized in the Neptune app.

See example in Neptune  Code examples 

Before you start#

To see how the integration works without setting up your environment, run the example in Colab .

Quickstart#

To log metadata to Neptune, create a run object in your script. The run objects contain the metadata that you want to track, plus the automatically logged system metrics.

To log metadata to Neptune:

  1. Save your Neptune API token and full project name as environment variables.

    How do I save my credentials as environment variables?

    Set your Neptune API token and full project name to the NEPTUNE_API_TOKEN and NEPTUNE_PROJECT environment variables, respectively.

    export NEPTUNE_API_TOKEN="h0dHBzOi8aHR0cHM.4kl0jvYh3Kb8...ifQ=="
    
    export NEPTUNE_PROJECT="ml-team/classification"
    
    export NEPTUNE_API_TOKEN="h0dHBzOi8aHR0cHM.4kl0jvYh3Kb8...ifQ=="
    
    export NEPTUNE_PROJECT="ml-team/classification"
    
    setx NEPTUNE_API_TOKEN "h0dHBzOi8aHR0cHM.4kl0jvYh3Kb8...ifQ=="
    
    setx NEPTUNE_PROJECT "ml-team/classification"
    

    You can also navigate to SettingsEdit the system environment variables and add the variables there.

    %env NEPTUNE_API_TOKEN="h0dHBzOi8aHR0cHM.4kl0jvYh3Kb8...ifQ=="
    
    %env NEPTUNE_PROJECT="ml-team/classification"
    

    To find your credentials:

    • API token: In the bottom-left corner of the Neptune app, expand your user menu and select Get your API token. If you need the token of a service account, go to the workspace or project settings and enter the Service accounts settings.
    • Project name: Your full project name has the form workspace-name/project-name. You can copy it from the project menu ( Details & privacy).

    If you're working in Google Colab, you can set your credentials with the os and getpass libraries:

    import os
    from getpass import getpass
    os.environ["NEPTUNE_API_TOKEN"] = getpass("Enter your Neptune API token: ")
    os.environ["NEPTUNE_PROJECT"] = "workspace-name/project-name"
    
  2. In your script, import Neptune:

    import neptune
    
  3. Initialize a Neptune run:

    run = neptune.init_run()
    

    You can specify additional run parameters, such as tags or a description. For a full list of options, see the API reference.

  4. Log the GX OSS metadata. For example, to log a Data Context configuration under the gx/context/config namespace of a run, use:

    run["gx/context/config"] = context.get_config().to_json_dict()
    

    For details, see Logging examples.

  5. To stop the connection to Neptune and sync all data, call the stop() method:

    run.stop()
    
  6. Run your script.

    To open the run and watch the logging live, click the Neptune link that appears in the console output.

    Example link: https://app.neptune.ai/o/showcase/org/great-expectations/e/GX-1

Use the Neptune app to visualize, compare, and organize your logged metadata. For details, see Experiments.

Logging examples#

You can organize the logged metadata into a folder-like structure with the namespaces and fields of a run object. For details, see Namespaces and fields.

Tip

To view the logging examples in the Neptune app, check the GX metadata dashboard.

Log a Data Context configuration#

To log a Data Context configuration under the gx/context/config namespace of a run, use:

run["gx/context/config"] = context.get_config().to_json_dict()

Log a Checkpoint configuration#

The Checkpoint configuration contains unsupported values such as lists. To convert all the lists to strings, use the stringify_unsupported() method.

To log a Checkpoint config under the gx/checkpoint/config namespace of a run, use:

from neptune.utils import stringify_unsupported

run["gx/checkpoint/config"] = stringify_unsupported(checkpoint.config.to_json_dict())

Log Expectations#

To log and organize your Expectations, use:

expectation_suite = validator.get_expectation_suite().to_json_dict()

run["gx/meta"] = expectation_suite["meta"]

# Log the Expectation Suite name to the `gx/expectations/expectations_suite_name` 
# field of a run:
run["gx/expectations/expectations_suite_name"] = expectation_suite[
    "expectation_suite_name"
]

# Create a numbered folder for each Expectation in the `gx/expectations` namespace:
for idx, expectation in enumerate(expectation_suite["expectations"]):
    run["gx/expectations"][idx] = expectation

Log validation results#

By saving validation results as a dictionary, you can access them programmatically and use in your CI/CD pipelines:

results_dict = checkpoint_result.list_validation_results()[0].to_json_dict()

run["gx/validations/json"] = results_dict

for idx, result in enumerate(results_dict["results"]):
    run["gx/validations/json/results"][idx] = result

Upload HTML reports#

You can also upload the rich HTML reports to Neptune and then interact with them in the app:

  1. HTML reports are available only for FileDataContext. Start by converting the context to FileDataContext:

    from great_expectations.data_context import EphemeralDataContext
    import os
    
    if isinstance(context, EphemeralDataContext):
        context = context.convert_to_file_context()
    
  2. Fetch the local_site_path of a Data Context:

    local_site_path = os.path.dirname(context.build_data_docs()["local_site"])[7:]
    
  3. To log the HTML Expectations and Validation reports, use:

    run["gx/expectations/reports"].upload_files(
        os.path.join(local_site_path, "expectations")
    )   
    
    run["gx/validations/reports"].upload_files(os.path.join(
        local_site_path, "validations")
    )
    
  4. To view the uploaded reports in the Neptune app, navigate to All metadata, or create a custom dashboard with a File preview widget.