How to use Neptune with pandas#
pandas is a popular open source data analysis and manipulation tool. With Neptune, you can log and visualize pandas DataFrames.
Before you start#
- Sign up at neptune.ai/register.
- Create a project for storing your metadata.
-
Have pandas and Neptune installed:
pandas logging example#
-
Import Neptune and start a run:
-
If you haven't set up your credentials, you can log anonymously:
-
-
Create a pandas DataFrame object:
Upload the DataFrame to Neptune as HTML#
Upload the DataFrame to Neptune as CSV#
You can save the DataFrame as a CSV and then upload it to Neptune with the upload()
method. This lets you view and sort the data in Neptune's interactive table format.
csv_fname = "iris.csv"
iris_df.to_csv(csv_fname, index=False)
run["data/iris-df-csv"].upload(csv_fname)
If you want to avoid writing to disk, you can write the dataframe to an in-memory object, then upload it using the from_stream()
method:
from io import StringIO
from neptune.types import File
csv_buffer = StringIO()
df.to_csv(csv_buffer, index=False)
run["data/iris-df-csv-buffer"].upload(File.from_stream(csv_buffer, extension="csv"))
(Optional) Log pandas profile report to Neptune#
You can log your dataset's Exploratory Data Analysis (EDA) report to Neptune, utilizing libraries that support pandas such as ydata-profiling .
from ydata_profiling import ProfileReport
profile = ProfileReport(iris_df, title="Iris Species Dataset Profile Report")
run["data/iris-df-profile-report"].upload(
File.from_content(profile.to_html(), extension="html")
)
View the DataFrame in Neptune#
-
To stop the connection to Neptune and sync all data, call the
stop()
method: -
Click the Neptune link in the console output to open the run.
Example link: https://app.neptune.ai/o/common/org/pandas-support/e/PD-1
Result
The resulting dataframe is logged as an HTML or a CSV object.
You can view it in the All metadata section or create a custom dashboard and display the dataframe as a widget.