Self-hosted Neptune deployment configuration#
Self-hosted Neptune is composed of a set of microservices (Neptune Services) distributed as a Helm chart deployable on Kubernetes.
The Neptune installer consists of two files:
neptune_installation_{version}.tgz
– needs to be unpacked. It contains a single directory calledneptune_installation
.configuration.yaml
– contains a minimal installation configuration. Before starting the installation, modify the defaults to suit your deployment scenario if needed.
Warning
The configuration.yaml
file may contain Neptune's Docker registry credentials and your license for using Neptune. Treat it confidentialLy.
Depending on the installation type, some parameters are optional.
General configuration#
This section describes some basic options you need to configure before installing Neptune.
Parameter | Description | Example value |
---|---|---|
administrator_username |
Desired username of your administrator account. Can contain letters, numbers, and hyphens. | Recommended: administrator |
administrator_password |
Password for the administrator account. | Use a strong password. |
organization_name |
Name of your organization in Neptune. Can contain letters, numbers and hyphens. | Something simple, related to your company or organization. |
deployment_type |
Set to local in case of single-node deployment, or cluster in case of deployment on existing Kubernetes cluster. |
- |
Kubernetes cluster configuration#
Use options in this section to tell the Neptune installer how to interact with your Kubernetes cluster.
Note
Only use these options with cluster deployments (deployment_type
set to cluster
).
Parameter | Description | Example value |
---|---|---|
kubeconfig_path |
Absolute path to existing kubectl configuration file pointing to the cluster where you want to deploy Neptune. |
/etc/rancher/k3s/k3s.yaml |
namespace |
Kubernetes namespace where Neptune will be deployed. Created if it doesn't exist. | Default: neptune |
node_tolerations |
Array of node taint tolerations that the Neptune installation can use. This is copied directly to Kubernetes manifests. | - |
node_selector |
Key-value map that the Neptune installation should use to select nodes. This is copied directly to Kubernetes manifests. | - |
Storage configuration#
Use the following options to configure how to provision storage for Neptune in your deployment.
The types of data that Neptune stores can be roughly split into two categories:
- Service storage – the storage required by MySQL, Elasticsearch, and Kafka. This data is always stored on a POSIX-compliant file system.
- Object storage – for the bulk of your data. This refers to ML metadata that Neptune tracks and stores, such as logs, numerical series, tabular data, and images. This data can be stored either on a POSIX-compliant file system or in S3-compatible object storage.
For more details about storage requirements and options, see Neptune's system requirements.
Regardless of the deployment type, you can always provide MySQL, Elasticsearch, Kafka, and S3-compatible object storage, in which case Neptune doesn't require additional storage.
Service storage#
If you prefer to use the MySQL, Elasticsearch, and Kafka services provided as part of the Neptune installer, they need a way to provision storage for themselves. The options depend on the deployment type.
Parameter | Description | Example value |
---|---|---|
storage_device |
Should point to an SSD device. You can retrieve the value from the output of the lsblk command. |
For a clean VM, it's often /dev/sdb . |
storage_path |
Absolute path on the local system where Neptune will store its data. This is also the space where the disk is mounted. | Default: /mnt/neptune |
If you use MySQL, Kafka, or Elasticsearch delivered as part of the Neptune installation, the following parameters allow Neptune to provision disk space for them:
Parameter | Description |
---|---|
ssd_storage_class |
Name of the storage class to be used by MySQL and Elasticsearch services. |
hdd_storage_class |
Name of the storage class to be used for Kafka. |
Object storage#
Note for local deployments
Object storage can be provisioned automatically by the installer.
You can implement object storage in one of the following ways:
- By providing your own file storage with the
storage_pvc_name
option. - With S3-compatible storage.
Options in this section work regardless of whether you're deploying Neptune in local or cluster mode.
Neptune stores most of your data either on a POSIX file system or in object storage. In most situations, we strongly recommend using object storage.
Parameter | Description |
---|---|
storage_pvc_name |
The name of the Persistent Volume Claim present in the namespace to which Neptune will be installed. This claim needs to be of type This parameter is disregarded if S3-compatible object storage is provided. |
2.2
migration note
If you're migrating from Neptune version 2.1
to 2.2
, the storage_pvc_name
value will serve as the source for the migration. No new data will be written there.
Parameter | Description |
---|---|
s3_bucket_name |
Name of the bucket Neptune uses for storing most of your data. |
s3_service_endpoint |
The S3 service endpoint to connect to.
|
s3_region |
The AWS region corresponding to the s3_service_endpoint parameter. For S3-compatible services, the value depends on the service. |
s3_access_key_id |
An S3 access key. |
s3_secret_access_key |
An S3 secret key. |
Database configuration#
The Neptune installer can set up a MySQL database as part of the installation process, but you can also provide one yourself. You may want to do this especially in cloud environments, where the storage used by the database is automatically scaled.
Parameter | Description |
---|---|
db_host |
External MySQL database host for Neptune to use. |
db_port |
(Optional) External MySQL database port. Default: 3306 |
db_username |
The username of a user with access to all schemes used by Neptune. Required if db_host is set. |
db_password |
The password of the user specified in db_username . Required if db_host is set. |
Required database schemas#
Before running the Neptune installer, create the listed database schemas and grant the user defined in db_username
access to them.
Make sure that the schemas have UTF-8 as the default character set.
neptune_instance
neptune_notifications
neptune_discussions
neptune_leaderboard
neptune_keycloak
neptune_artifacts
Elasticsearch configuration#
The Neptune installer can set up an Elasticsearch instance as part of the installation process, but you can also provide one yourself.
Parameter | Description |
---|---|
elasticsearch_address |
Address of the external Elasticsearch server host in format https://<address>[:<http api port>] . The <http api port> is optional and defaults to 443 for HTTPS connections. |
elasticsearch_cluster_name |
Name of your Elasticsearch cluster. Defaults to elasticsearch , which is the default cluster name in Elasticsearch. |
Kafka configuration#
The Neptune installer can set up a Kafka instance as part of the installation process, but you can also provide one yourself.
Parameter | Description |
---|---|
kafka_address |
Address of the external Kafka cluster, as a comma-separated list of hosts. Required format:
|
Additional configuration#
Identity management system#
In some cases, you may want to connect Neptune to your own identity management system (like LDAP).
If your identity management system uses a certificate issued by your own Certificate Authority, you need to provide a set of trusted certificates to Neptune (so that it's able to verify the security of the connection).
Parameter | Description |
---|---|
trusted_certificates |
List of absolute paths to files containing certificates that are to be trusted. |
keycloak_java_opts |
Rarely needed option that allows overriding some default behaviors of Java. The value is a string containing Java options that are to be added to Keycloak (the component responsible for connecting to an external identity management system). Example value: |
Ingress controller#
Head on to Expose Neptune for how to configure the ingress controller and expose Neptune from your VM or cluster to the outside.