Storage troubleshooting and tips#
In this guide, you'll learn:
- How to identify what data is taking up storage space.
- What you can do to free up space.
- How you can reduce or manage the amount of data that is logged to Neptune.
Finding what takes up space#
Which projects take up the most space?#
To find out which of your projects take up the most space, click your workspace name in the top-left corner → Subscription → Usage.
Here you'll find a table of all your projects which can be sorted by storage used.
You can also get this information by querying the API and sorting the results by project or by user. The following Jupyter notebook demonstrates how to do this.
Notebook: Get storage per project and user 
Trash takes up space
As long as items remain in the trash, they take up storage space. To fully delete the items, you need to clear them from the trash.
- To permanently delete all trashed objects, select Empty trash.
- You can also use the
clear_trash()
ordelete_objects_from_trash()
function from the management API.
Which runs take up the most space?#
In the experiments table, you can sort the objects by size.
- Click Add column.
- Select the
sys/size
field, or start typing it until you see it in the search results. - In the Size column that appears, click the icon on the column and select Sort descending.
Scenario A: Certain objects take up a lot of space#
If only a few run objects take up a lot of storage space, look into what kind of metadata they tend to have logged.
-
Files, FileSets, and FileSeries are typically the main suspects.
You can check the size of
File
andFileSet
fields by browsing All metadata of a particular run. - Simple types, such as
Int
,Float
, andString
values, take up very little space and are unlikely to be the problem.
If there are any runs you no longer need, consider deleting them. To keep your storage manageable, make this clean-up a monthly or quarterly activity.
Tip
In the experiments table, you can display old runs by filtering based on the sys/creation_time
field.
If you want to keep the old runs, check if there are some individual metadata fields that you could delete, such as model checkpoints or large visualizations. You can do this via API by resuming each run and deleting fields or entire namespaces you don't need.
If you have a large datasets logged, consider storing them in dedicated cloud storage and only tracking them in Neptune as artifacts.
Scenario B: Objects are largely uniform in size#
If all runs are similar in size and there are no clear outliers, the next step would be to identify what metadata takes the most space.
- If all the metadata is important for the runs, consider deleting some runs that are no longer needed.
- You can modify the code to avoid logging some of the heavy metadata, or store the data externally (for example, in dedicated cloud storage) and only store a link to them in Neptune.
Setting up a run deletion script#
You can use the fetch_runs_table()
method to get a list of runs older than a specific age, then delete those runs permanently with trash management methods.
To set this up for all projects in a workspace, schedule for example a cron job to periodically do the following:
- Fetch all projects in the workspace using
management.get_project_list()
. - Loop through the projects and get a list of all runs in the projects using
Project.fetch_runs_table(columns=["sys/creation_time"])
. - Apply a filter to obtain runs older than a specific age.
- Trash the obtained runs using
management.trash_objects()
. -
Clear the trash using
management.clear_trash()
.The runs continue to take up storage space until emptied from the trash.
Related resources
- Experiments table: Searching and filtering runs
- Delete metadata from a run
- Trash and delete data
- Track artifacts
- Resume a run
- You can also navigate a run programmatically based on its structure. For details, see
run.get_structure()
.