App version: 3.20251201

Configure a custom code widget

In reports, you can write your own Python code and render it as an interactive widget. The code is executed in a secure sandbox and the widget reacts to viewer actions, such as filtering and selecting runs.

The custom code widget supports rendering everything compatible with Jupyter and IPython, for example:

pandas tables
Matplotlib plots
Plotly charts
HTML and JavaScript elements

note

This is an experimental feature that is available only to select self-hosted customers. To participate in the early access program and help us improve it, contact Neptune support.

To view and create custom code widgets, first enable them:

In your account settings, go to Experimental features.
Enable Custom code widgets.

To add a custom code widget to your report:

In the top toolbar, click New widget and select Custom code.
(Optional) Change the widget title.

The widget code consists of three parts:

=== TYPE DEFINITIONS & CONTEXT VARIABLES ===

This is the read-only code that represents the current state of the report.

In your custom code, reference the variables and type definitions included in this section to make the widget respond to viewer actions.

=== EXAMPLE FUNCTION ===

The widget comes with a predefined function that returns a runs table with the attributes of the sys namespace for the currently selected runs. The runs table is returned as a pandas dataframe and rendered as an HTML table inside the widget.

See the predefined function

def main():

    dfs = []
    for run_set in report_context.data_sources.values():
        if isinstance(run_set, RunSet) and len(run_set.runs) > 0:
            df = nq_runs.fetch_runs_table(
                project=run_set.project,
                runs=Filter.matches("sys/id", "|".join(run_set.runs)),
                attributes="sys/.*",
            )
            dfs.append(df)
        elif isinstance(run_set, ExperimentSet) and len(run_set.experiments) > 0:
            df = nq.fetch_experiments_table(
                project=run_set.project,
                experiments=run_set.experiments,
                attributes="sys/.*",
            )
            dfs.append(df)
    
    run_table_df = pd.concat(dfs, axis="index")
    return run_table_df

Replace this function with your own code. See examples.

=== HOW TO EXECUTE LOCALLY ===

The line if __name__ == "__main__": ensures safe importing of the main module. For details, see the Python documentation.

This part is necessary to execute the code locally.

To let report viewers interact with the widget, define inputs. For example, you can add search, sorting, or filtering directly in the widget.

To add a widget input:

In Widget inputs, click Add input.
In Type, select one of the following options:
- Text
- Number
- Slider
- Dropdown
- Checkbox
In Label, provide a descriptive identifier of the input for the report viewers.
The value of the ID field is auto-generated. Use it to reference the input in your custom code.

Supported packages

The custom code widget requires Python 3.13 and supports the following packages:

httpx
Jinja2
matplotlib
neptune-query
numpy
pandas
plotly
pydantic
requests

Execute the code locally

To grab the existing code and develop it in your local environment, use the Copy code button.

Note that for each widget input, the copied code contains a hard-coded value that is based on the value currently selected in the report. As a result, you get the same output both locally and when executing code in the Neptune app.

You can move the custom code widget to a new or existing report. You can also duplicate the widget in the report that you're currently working on.

To access the options, open the widget menu ().

Examples

The following examples show custom code widgets optimized for analyzing LLM eval results.

To recreate these widgets in your own project, first log mock data using the script:

Download the CSV file with evaluation data.
Save the file as evals.csv in the same directory as the script.

See the script for logging mock data

This example script requires Python version 3.8 or higher, plus the following dependencies:

neptune-scale
pandas

import csv
import os
import random
import sys
import tempfile
from datetime import datetime
from pathlib import Path

import pandas as pd
from neptune_scale import Run

PROMPTS = [
    "Implement a function that finds the maximum subarray sum using Kadane's algorithm.",
    "Write a function that checks if a binary tree is balanced.",
    "Create a function that finds the longest palindromic substring.",
    "Implement a basic LRU cache with get and put operations.",
    "Write a function that merges two sorted linked lists.",
]

OUTPUTS = [
    """def max_subarray(nums):
    max_sum = curr_sum = nums[0]
    for n in nums[1:]:
        curr_sum = max(n, curr_sum + n)
        max_sum = max(max_sum, curr_sum)
    return max_sum""",
    """def factorial(n):
    if n <= 1:
        return 1
    return n * factorial(n - 1)""",
    """class BST:
    def __init__(self):
        self.root = None
    def insert(self, val):
        if not self.root:
            self.root = TreeNode(val)""",
    """def longest_common_subsequence(s1, s2):
    m, n = len(s1), len(s2)
    dp = [[0] * (n + 1) for _ in range(m + 1)]
    for i in range(1, m + 1):
        for j in range(1, n + 1):
            if s1[i-1] == s2[j-1]:
                dp[i][j] = dp[i-1][j-1] + 1
            else:
                dp[i][j] = max(dp[i-1][j], dp[i][j-1])
    return dp[m][n]""",
]

COT_TEMPLATE = """### Understanding the problem
> Analyzing: "{prompt}"
> Identifying requirements and constraints

### Planning the solution
> Designing algorithm approach
> Considering time/space complexity

### Implementing the code
> Writing solution step by step

### Conclusion
> {verdict}"""

TYPES = ["grader"] * 4 + ["match"] * 3 + ["sampling"] * 3
MODELS = ["gpt-4o", "claude-3.5-sonnet"]


def generate_rows(step: int, total_steps: int) -> pd.DataFrame:
    progress = step / total_steps
    rows = []
    for i in range(10):
        eval_type = TYPES[i]
        model = MODELS[i % 2]
        prompt = random.choice(PROMPTS)
        output = random.choice(OUTPUTS)

        if eval_type == "grader":
            grade = str(min(5, max(0, int(2 + progress * 3 + random.uniform(-1, 1)))))
        elif eval_type == "match":
            grade = str(1 if random.random() < (0.3 + progress * 0.5) else 0)
        else:
            n = random.randint(1, 4)
            scores = [
                1 if random.random() < (0.4 + progress * 0.4) else -1 for _ in range(n)
            ]
            grade = str(scores)

        verdict = (
            "The implementation correctly solves the problem."
            if random.random() < progress
            else "The output does not match the requested functionality."
        )
        cot = COT_TEMPLATE.format(prompt=prompt[:50], verdict=verdict)

        rows.append(
            {
                "type": eval_type,
                "input_prompt": prompt,
                "generated_output": output,
                "grader_model": model,
                "grade": grade,
                "generated_cot": cot,
            }
        )
    return pd.DataFrame(rows)


def calculate_metrics(df: pd.DataFrame) -> dict:
    match_df = df[df["type"] == "match"]
    accuracy = match_df["grade"].astype(int).mean()

    sampling_df = df[df["type"] == "sampling"]
    all_scores = []
    for g in sampling_df["grade"]:
        all_scores.extend(eval(g))
    pos = sum(1 for s in all_scores if s == 1)
    neg = sum(1 for s in all_scores if s == -1)
    total = pos + neg

    return {
        "accuracy/match": accuracy,
        "sampling_ratio/positive": pos / total if total > 0 else 0.5,
        "sampling_ratio/negative": neg / total if total > 0 else 0.5,
    }


def main():
    required = ["NEPTUNE_API_TOKEN", "NEPTUNE_PROJECT"]
    if not all(k in os.environ for k in required):
        sys.exit(
            "Error: Set NEPTUNE_API_TOKEN and NEPTUNE_PROJECT environment variables"
        )

    total_steps = 20
    experiment_name = f"evals-{datetime.now():%Y-%m-%d-%H-%M-%S}"

    with tempfile.TemporaryDirectory() as tmpdir:
        with Run(experiment_name=experiment_name, run_id=experiment_name) as run:
            for step in range(total_steps):
                df = generate_rows(step, total_steps)
                metrics = calculate_metrics(df)
                run.log_metrics(data=metrics, step=step)

                csv_path = Path(tmpdir) / f"evals_step_{step}.csv"
                df.to_csv(csv_path, index=False, quoting=csv.QUOTE_ALL)
                run.log_files(files={"evals/checkpoint": str(csv_path)}, step=step)

            print(f"Run URL: {run.get_run_url()}")


if __name__ == "__main__":
    main()

Visualize multi-dimensional data

This example shows how to visualize multi-dimensional data with plotly and neptune-query:

The returned scatter plot uses symbols to distinguish between experiments.
Report viewers can choose which attributes to display on the Y-axis and change their color.

Multi-dimensional data visualization

See the widget code

from neptune_query.filters import Filter
import plotly.express as px


def fetch_filtered_series(widget_context: WidgetContext, report_context: ReportContext) -> pd.DataFrame:
    dfs = []
    for run_set in report_context.data_sources.values():
        if isinstance(run_set, RunSet):
            df = nq_runs.fetch_metrics(
                project=run_set.project,
                runs=Filter.matches("sys/id", "|".join(run_set.runs)),
                attributes=["sys/custom_run_id", "accuracy/match", "sampling_ratio/positive", "sampling_ratio/negative"],
            )
            df["run_or_exp"] = df.index.get_level_values(0)
            dfs.append(df)
        elif isinstance(run_set, ExperimentSet):
            df = nq.fetch_metrics(
                project=run_set.project,
                experiments=run_set.experiments,
                attributes=["sys/custom_run_id", "accuracy/match", "sampling_ratio/positive", "sampling_ratio/negative"],
            )
            df["run_or_exp"] = df.index.get_level_values(0)
            dfs.append(df)
        else:
            raise NotImplementedError(f"Unexpected {type(run_set)}=")

    run_table_df = pd.concat(dfs, axis="index")
    return run_table_df.reset_index()


def main():
    df = fetch_filtered_series(widget_context, report_context)
    fig = px.scatter(
        df,
        x="step",
        y="accuracy/match",
        color="sampling_ratio/negative",
        symbol="run_or_exp",
    )
    fig.update_layout(legend=dict(
        yanchor="top",
        y=0.99,
        xanchor="left",
        x=0.01
    ))
    fig.show()

tip

To speed up your queries, apply filters using neptune-query and narrow down the results to the specific attributes that you want to display.
When loading series that have over 1000 steps, choose fetch_metric_buckets() over fetch_metrics().

Interactive table

This example shows an interactive table that lets report viewers analyze individual eval samples:

Use neptune-query to load a CSV file selected by the report viewer.
Display the selected file as an HTML table styled with CSS.
Use JavaScript to hide long chain-of-thought strings and make them visible on button click.

Interactive table

See the widget code

def esc_html(text):
    return str(text).replace('&', '&amp;').replace('<', '&lt;').replace('>', '&gt;').replace('"', '&quot;').replace("'", '&#39;')

def esc_attr(text):
    return str(text).replace('&', '&amp;').replace('"', '&quot;').replace("'", '&#39;').replace('<', '&lt;').replace('>', '&gt;')

HTML_TEMPLATE = """<!DOCTYPE html>
    <html>
    <head>
        <style>
body {{ font-family: sans-serif; margin: 20px; }}
table {{ width: 100%; border-collapse: collapse; }}
th {{ background: #333; color: white; padding: 8px; text-align: left; font-size: 12px; }}
td {{ padding: 8px; border-bottom: 1px solid #ddd; font-size: 12px; position: relative; }}
tbody tr:hover {{ background: #f5f5f5; }}
.input-cell {{ max-width: 200px; word-wrap: break-word; white-space: normal; }}
.copyable-cell {{ position: relative; }}
.copy-btn {{ display: none; position: absolute; right: 8px; top: 50%; transform: translateY(-50%); cursor: pointer; border: 1px solid #ccc; background: white; padding: 4px 8px; font-size: 11px; }}
.copy-btn:hover {{ background: #f0f0f0; }}
.copyable-cell:hover .copy-btn {{ display: block; }}
.modal {{ display: none; position: fixed; z-index: 1000; left: 0; top: 0; width: 100%; height: 100%; background: rgba(0,0,0,0.5); }}
.modal-content {{ background: white; margin: 5% auto; padding: 20px; width: 80%; max-width: 800px; max-height: 80vh; overflow-y: auto; }}
.modal-header {{ display: flex; justify-content: space-between; margin-bottom: 15px; }}
.modal-title {{ margin: 0; font-size: 13px; font-weight: normal; }}
.close {{ cursor: pointer; font-size: 24px; background: none; border: none; }}
.cot-content {{ font-size: 11px; line-height: 1.4; }}
button {{ cursor: pointer; border: 1px solid #ccc; background: white; padding: 4px 8px; margin: 0 2px; font-size: 11px; }}
button:hover {{ background: #f0f0f0; }}
pre {{ background: #f5f5f5; padding: 8px; overflow-x: auto; margin: 0; }}
code {{ font-family: monospace; font-size: 11px; }}
.toast {{ display: none; position: fixed; bottom: 20px; right: 20px; background: #4caf50; color: white; padding: 10px 16px; border-radius: 4px; z-index: 2000; }}

@media (prefers-color-scheme: dark) {{
  body {{
    background: #0f1115;
    color: #e6e6e6;
  }}

  table {{
    background: #141821;
  }}

  td {{
    background: #141821;
    border-bottom: 1px solid #2a3140;
  }}

  tbody tr:hover,
  tbody tr:hover td {{
    background: #1b2230;
  }}

  pre {{
    background: #0b0e14;
  }}

  button,
  .copy-btn {{
    background: #141821;
    color: #e6e6e6;
    border-color: #2a3140;
  }}

  button:hover,
  .copy-btn:hover {{
    background: #1b2230;
  }}

  .modal-content {{
    background: #141821;
    color: #e6e6e6;
    border: 1px solid #2a3140;
  }}
}}
        
        </style>
    </head>
    <body>
<div><strong>Checkpoint {checkpoint}</strong> • Model: {model} • {count} evaluations</div><br>
            <table>
<thead><tr><th>Type</th><th>Input Prompt</th><th>Generated Output</th><th>Model</th><th>Grade</th><th>CoT</th></tr></thead>
<tbody>{rows}</tbody>
            </table>
        
        <div id="cotModal" class="modal">
            <div class="modal-content">
<div class="modal-header"><h3 id="modalTitle" class="modal-title">Chain of Thought</h3><button class="close" onclick="closeModal()">&times;</button></div>
                    <div id="cotContent" class="cot-content"></div>
            </div>
        </div>
        
<div id="toast" class="toast">Copied!</div>
        
        <script>
function unesc(t) {{ var d = document.createElement('div'); d.innerHTML = t; return d.textContent || d.innerText || ''; }}
function copy(t) {{ var a = document.createElement('textarea'); a.value = t; a.style.position = 'fixed'; a.style.left = '-9999px'; document.body.appendChild(a); a.select(); document.execCommand('copy'); document.body.removeChild(a); var x = document.getElementById('toast'); x.style.display = 'block'; setTimeout(function() {{ x.style.display = 'none'; }}, 2000); }}
function copyCell(btn) {{ var c = btn.closest('td'); copy(unesc(c.getAttribute('data-text'))); }}
function copyCoT(btn) {{ var c = btn.closest('td'); copy(unesc(c.getAttribute('data-cot'))); }}
function showCoT(btn) {{ var c = btn.closest('td'); var modal = document.getElementById('cotModal'); var prompt = c.getAttribute('data-prompt'); document.getElementById('modalTitle').textContent = 'CoT: ' + prompt + '...'; document.getElementById('cotContent').innerHTML = '<pre><code>' + unesc(c.getAttribute('data-cot')).replace(/</g, '&lt;').replace(/>/g, '&gt;') + '</code></pre>'; modal.style.display = 'block'; }}
function closeModal() {{ document.getElementById('cotModal').style.display = 'none'; }}
window.onclick = function(e) {{ if (e.target.id === 'cotModal') closeModal(); }}
document.addEventListener('keydown', function(e) {{ if (e.key === 'Escape') closeModal(); }});
        </script>
    </body>
</html>"""

ROW_TEMPLATE = """<tr>
<td>{type}</td>
<td class="input-cell copyable-cell" data-text="{input_prompt_attr}">{input_prompt}<button class="copy-btn" onclick="copyCell(this)">Copy</button></td>
<td class="copyable-cell" data-text="{output_attr}"><pre><code>{output}</code></pre><button class="copy-btn" onclick="copyCell(this)">Copy</button></td>
<td>{grader_model}</td>
<td><pre><code>{grade}</code></pre></td>
<td data-cot="{cot}" data-prompt="{prompt_preview}"><button onclick="copyCoT(this)">Copy</button><button onclick="showCoT(this)">View</button></td>
</tr>"""

def download_file_at_step(report_context: ReportContext, checkpoint_step: int):
    # NOTE: this example only reads from the first run set activated in the report
    data_source = list(report_context.data_sources.values())[0]
    
    if data_source.data_source_type == "experiment_set":
        series = nq.fetch_series(
            project=data_source.project,
            experiments=data_source.experiments[0],
            attributes="evals/checkpoint",
            step_range=(checkpoint_step, checkpoint_step)
        )
    else:
        raise Exception("This example only supports experiments")
        
    file_path_df = nq.download_files(files=series, destination="checkpoint.csv")
    df = pd.read_csv(file_path_df.iloc[0]["evals/checkpoint"])
    return df

def main():
    checkpoint_input = widget_context.inputs.get("checkpoint")    
    checkpoint_step = 0 if checkpoint_input is None else int(checkpoint_input.current_value)    
    try:
        df = download_file_at_step(report_context, checkpoint_step)
    except Exception as e:
        msg = f"Failed to load evals at {checkpoint_step=}:\n{e}"
        raise ValueError(msg) from e
    
    grader_model_input = widget_context.inputs.get("grader_model")
    grader_model_filter = None if grader_model_input is None else grader_model_input.current_value
    if grader_model_filter:
        df = df[df['grader_model'] == grader_model_filter]
    else:
        df = df[df['grader_model'].notna() & (df['grader_model'] != '')]
    
    rows = ''.join([
        ROW_TEMPLATE.format(
            type=esc_html(row['type']),
            input_prompt=esc_html(row['input_prompt']),
            input_prompt_attr=esc_attr(row['input_prompt']),
            output=esc_html(row['generated_output']),
            output_attr=esc_attr(row['generated_output']),
            grader_model=esc_html(row['grader_model']),
            grade=esc_html(row['grade']),
            cot=esc_attr(row['generated_cot']),
            prompt_preview=esc_attr(str(row['input_prompt'])[:50])
        )
        for _, row in df.iterrows()
    ])
    html = HTML_TEMPLATE.format(
        checkpoint=checkpoint_step,
        model=grader_model_filter or 'All',
        count=len(df),
        rows=rows
    )
    return raw_html(html)

tip

To convert a Python string into HTML output that can be rendered inside the widget, use the raw_html() function in your main() function. You can then use the <style> tag to apply CSS styles and the <script> tag to include JavaScript code.
This example supports widget inputs. To make the widget react to user actions when they view it in the app, define a Checkpoint slider with values from 0 to 20 or a Grader model dropdown with two values: "gpt-4o" and "claude-3.5-sonnet".

note

Due to security reasons, it might be impossible to use some of the third-party JavaScript libraries in your environment. For help, contact your administrator or Neptune support.

Configure a custom code widget

Create a widget

Define the widget code

Configure widget inputs

Supported packages

Execute the code locally

Move or duplicate a widget

Examples

Visualize multi-dimensional data

Interactive table

Create a widget​

Define the widget code​

Configure widget inputs​

Supported packages​

Execute the code locally​

Move or duplicate a widget​

Examples​

Visualize multi-dimensional data​

Interactive table​

Create a widget

Define the widget code

Configure widget inputs

Supported packages

Execute the code locally

Move or duplicate a widget

Examples

Visualize multi-dimensional data

Interactive table