Skip to content

How to use Neptune with pandas#

pandas is a popular open source data analysis and manipulation tool. With Neptune, you can log and visualize pandas DataFrames.

Custom dashboard displaying metadata logged with pandas

See example in Neptune 

Before you start#

  • Sign up at neptune.ai/register.
  • Create a project for storing your metadata.
  • Have pandas and Neptune installed:

    pip install -U pandas neptune
    
    conda install pandas neptune
    

pandas logging example#

  1. Import Neptune and start a run:

    import neptune
    
    run = neptune.init_run() # (1)!
    
    1. If you haven't set up your credentials, you can log anonymously:

      neptune.init_run(
          api_token=neptune.ANONYMOUS_API_TOKEN,
          project="common/quickstarts",
      )
      
  2. Create a pandas DataFrame object:

    import pandas as pd
    
    iris_df = pd.read_csv(
        "https://raw.githubusercontent.com/mwaskom/seaborn-data/master/iris.csv",
        nrows=100,
    )
    

Upload the DataFrame to Neptune as HTML#

from neptune.types import File

run["data/iris-df-html"].upload(File.as_html(iris_df))

Upload the DataFrame to Neptune as CSV#

You can save the DataFrame as a CSV and then upload it to Neptune with the upload() method. This lets you view and sort the data in Neptune's interactive table format.

csv_fname = "iris.csv"
iris_df.to_csv(csv_fname, index=False)
run["data/iris-df-csv"].upload(csv_fname)

If you want to avoid writing to disk, you can write the dataframe to an in-memory object, then upload it using the from_stream() method:

from io import StringIO
from neptune.types import File

csv_buffer = StringIO()
df.to_csv(csv_buffer, index=False)
run["data/iris-df-csv-buffer"].upload(File.from_stream(csv_buffer, extension="csv"))

(Optional) Log pandas profile report to Neptune#

You can log your dataset's Exploratory Data Analysis (EDA) report to Neptune, utilizing libraries that support pandas such as ydata-profiling .

pip install ydata-profiling
from ydata_profiling import ProfileReport

profile = ProfileReport(iris_df, title="Iris Species Dataset Profile Report")
run["data/iris-df-profile-report"].upload(
    File.from_content(profile.to_html(), extension="html")
)

View the DataFrame in Neptune#

  1. To stop the connection to Neptune and sync all data, call the stop() method:

    run.stop()
    
  2. Click the Neptune link in the console output to open the run.

Example link: https://app.neptune.ai/o/common/org/pandas-support/e/pd-1

Result

The resulting dataframe is logged as an HTML or a CSV object.

You can view it in the All metadata section or create a custom dashboard and display the dataframe as a widget.

See example in Neptune