Neptune Query Language (NQL)#
Experimental
This feature is experimental. We're happy to hear your feedback through GitHub!
When using fetch_runs_table()
to fetch runs from your project, you can pass a raw NQL string to the query
argument.
import neptune
project = neptune.init_project()
project.fetch_runs_table(
query='(`sys/tags`:stringSet CONTAINS "some-tag") AND (`f1`:float >= 0.85)'
)
This way, the runs can be filtered by any field and a number of criteria.
How is NQL different from the app search?
The search query builder in the web app has some extra functionality added on top, to make query building more convenient. Queries are converted to raw NQL underneath the hood.
In the first version of adding querying capabilities to the API, we're exposing NQL without modifications.
The later sections contain example queries for various data types:
- Float
- Float series
- String
- Tags
- Artifact version
- System metadata (name, size, state, timestamps, etc.)
NQL syntax#
An NQL query has the following parts:
For example:
Building a query: Step by step#
The following tabs walk you through constructing each part of a valid query.
Use the field name you specified when assigning the metadata to the run. For the above example, it would be: run["scores/f1"] = f1_score
While usually not necessary, it's safest to enclose the field name in single backquotes (`
).
For Neptune to correctly parse the specified field name, you need to provide the Neptune field type immediately after the field name, separated by a colon (:
). The field type must be in camel case.1
Available types:
The available operators depends on the field type.
Operators | Supported field types |
---|---|
= ,!= |
artifact , bool , experimentState , string , int , float , floatSeries aggregates |
> , >= , < , <= |
int , float , floatSeries aggregates |
CONTAINS |
string , stringSeries , stringSet |
EXISTS |
Any |
NOT |
Negates other operators or clauses. See Negation ↓ |
It's usually possible to enter the plain value without quotes, but in some cases double quotes ("
) are necessary. For example, if the value contains a space.
Multi-clause (complex) queries#
You can also build a complex query, in which multiple conditions are joined by logical operators.
Surround the clauses with ()
and use AND
or OR
to join them.
query='(last(`metrics/acc`:floatSeries) >= 0.85) AND (`learning_rate`:float = 0.002)'
Note that each run is matched against the full query individually.
Negation#
You can use NOT
in front of operators or clauses.
The following are equivalent and would exclude runs that have "blobfish" in their name:
You can also negate joined clauses. This requires enclosing them with parentheses:
NOT (`sys/name`:string CONTAINS blobfish AND `sys/failed`:bool = True)
More examples#
Models small enough to be used on mobile that have decent test accuracy#
run = neptune.init_run()
run["model_info/size_MB"] = 45
for epoch in epochs:
# training loop
test_acc = ...
run["test/acc"] = test_acc
All of Jackie's runs from the current exploration task#
run = neptune.init_run(
api_token="...", # (1)!
tags=["exploration", "pretrained"],
)
- The API token of jackie's account, passed to this argument or set to the
NEPTUNE_API_TOKEN
environment variable
All failed runs from the start of the year#
Float#
If you assigned a float value to a field, that field type is Float
and can be queried as follows:
project.fetch_runs_table(
query='`f1_score`:float < 0.50'
)
In this case, the logging code could be something like run["f1_score"] = 0.48
for a run matching the expression.
Float series#
If you created a series of values with append()
or extend()
, use an aggregate function to obtain a value that characterizes the series.
The following statistical functions are supported:
average()
last()
max()
min()
variance()
String#
You can filter either by the full string, or use the CONTAINS
operator to access substrings.
project.fetch_runs_table(
query='`sys/name`:string CONTAINS "blobfish"'
)
See also: Name.
String series#
For StringSeries fields, only the last logged entry is considered.
For example, the last line of logged system metrics (stderr
or stdout
).
Tags#
When adding tags at creation or later through the web app, they're stored as a StringSet
in the auto-created sys/tags
field. To filter by one or more tags, this is the field you need to access.
(`sys/tags`:stringSet CONTAINS "tag1") OR (`sys/tags`:stringSet CONTAINS "tag2")
(`sys/tags`:stringSet CONTAINS "tag1") AND (`sys/tags`:stringSet CONTAINS "tag2")
System metadata#
The system namespace (sys
) automatically stores basic metadata about the environment and run. Most of the values are simple string, float, or Boolean values.
Learn more
Date and time#
Neptune automatically creates three timestamp fields:
sys/creation_time
: When the run object was first created.sys/modification_time
: When the object was last modified (for example, a tag was removed or some metadata was logged).sys/ping_time
: When the object last interacted with the Python client library (something was logged or modified through the code).
For the value, you can enter a combined date and time representation with a time-zone specification, in ISO 8601 format:
where Z
is the time-zone offset for UTC. You can use a different offset.
`sys/ping_time`:datetime > "2024-02-06T05:00:00Z"
`sys/ping_time`:datetime > "2024-02-06T05:00:00+09"
You can also enter relative time values:
-2h
(last 2 hours)-5d
(last 5 days)-1M
(last month)
Description#
You can pass a description to the description
argument of the init_run()
function. You can also set the description through the web app, in the run information modal.
To filter by the description:
project.fetch_runs_table(
query='`sys/description`:string CONTAINS "new data"'
)
ID#
Each run automatically receives a unique Neptune ID, which consists of the project key and a counter.
Use the OR
operator to fetch multiple specific runs at once.
project.fetch_runs_table(
query='(`sys/id`:string = "NLI-35") OR (`sys/id`:string = "NLI-36")'
)
Name#
You can pass a name to the name
argument of the init_run()
function, or add it later through the run information modal in the web app.
Neptune does not require the name to be unique, but you can use it as a human-friendly identifier.
project.fetch_runs_table(
query='`sys/name`:string CONTAINS "blobfish"'
)
Owner#
The owner refers to the user or service account that created the run.
project.fetch_runs_table(
query='`sys/owner`:string CONTAINS "@ml-team"' # (1)!
)
- In this case, the expression matches all service account names that belong to the workspace ml-team. Learn more: Service accounts →
Size#
Size refers to the run object itself. That is, how much storage space it's taking up in Neptune.
Note on storage and trash
As long as runs remain in the project trash, they take up space.
By default, trashed objects are excluded from the query. To include them:
project.fetch_runs_table(
query='`sys/size`:float > 100MB',
trashed=None, # (1)!
)
- To include only trashed runs, set to
True
.
There's a few ways to enter the size value. If you include a space, you need to enclose the value in double quotes ("
).
The following are equivalent:
State#
If a run has been initialized for logging or read-only access, its state is active
as long as the connection to Neptune remains open. Otherwise, the state is inactive
.
You can ensure that only closed runs are fetched with the following:
project.fetch_runs_table(
query='`sys/state`:experimentState = "inactive"'
)
Status (failed)#
If an exception occurred during the run, it's set as "Failed". In practice, it means the sys/failed
field is set to True
.
Artifact version#
You can filter runs by Artifact hash:
project = neptune.init_project()
project.fetch_runs_table(
query='`dataset_version`:artifact = 9a113b799082e5fd628be178bedd52837bac24e91f'
)
Other file-related fields are not supported.
Learn more
Related
-
The type specification is needed in order to disambiguate between runs that may have the same field name but of different data types. ↩