Skip to content

Storage troubleshooting and tips#

In this guide, you'll learn:

  • How to identify what data is taking up storage space.
  • What you can do to free up space.
  • How you can reduce or manage the amount of data that is logged to Neptune.

Finding what takes up space#

Which projects take up the most space?#

To find out which of your projects take up the most space, click your workspace name in the top-left corner → SubscriptionUsage.

Here you'll find a table of all your projects which can be sorted by storage used.

You can also get this information by querying the API and sorting the results by project or by user. The following Jupyter notebook demonstrates how to do this.

View notebook in Colab 

Trashed runs or models still take up space

Trashed items are not deleted until you manually empty the trash.

To check or empty trash, navigate to a project and select the Trash tab.

  • To permanently delete all listed objects, select Empty trash.

Which runs or models take up the most space?#

In the table view for runs, models, or model versions, you can sort the objects by size.

  1. Click Add column.
  2. Select the sys/size field, or start typing it until you see it in the search results.
  3. In the Size column that appears, click the icon on the column and select Sort descending.

Scenario A: Certain objects take up a lot of space#

If only a few runs or model objects take up a lot of storage space, look into what kind of metadata they tend to have logged.

  • Files, FileSets, and FileSeries are typically the main suspects.

    You can check the size of File and FileSet fields by browsing All metadata of a particular run.

  • Series of floats or strings take some space, but not much.

  • Basic types, such as Int, Float, and String values, take up very little space and are unlikely to be the problem.

If there are any runs you no longer need, consider deleting them. To keep your storage manageable, make this clean-up a monthly or quarterly activity.

Tip

In the runs table, you can display old runs by filtering based on the sys/creation_time field.

If you want to keep the old runs, check if there are some individual metadata fields that you could delete, such as model checkpoints or large visualizations. You can do this via API by resuming each run and deleting fields or entire namespaces you don't need.

If you have a large datasets logged, consider storing them in dedicated cloud storage and only tracking them in Neptune as artifacts.

Scenario B: Objects are largely uniform in size#

If all runs are similar in size and there are no clear outliers, the next step would be to identify what metadata takes the most space.

  • If all the metadata is important for the runs, consider deleting some runs that are no longer needed.
  • You can modify the code to avoid logging some of the heavy metadata, or store the data externally (for example, in dedicated cloud storage) and only store a link to them in Neptune.

Related resources

Getting help