Evaluating dataset quality

Guides

Read evaluation status and quality metrics for a processed dataset.

After an adaptation run finishes successfully, the platform can produce evaluation signals—scores and related metrics that summarize how the augmented data compares to the original on quality dimensions the pipeline measures.

In the Python SDK you retrieve that information in two complementary ways:

datasets.get_evaluation(dataset_id) — dedicated response with evaluation pipeline status and structured quality metrics.
datasets.get(dataset_id) — full dataset record including evaluation_summary, a compact mirror of the headline metrics when evaluation has finished.

Fetch evaluation results

Call get_evaluation with the same dataset_id you adapted:

ev = client.datasets.get_evaluation(dataset_id)

print(ev.status)  # pending | running | succeeded | failed | skipped
if ev.quality:
    print(f"Score before: {ev.quality.score_before}")
    print(f"Score after:  {ev.quality.score_after}")
    print(f"Improvement:  {ev.quality.improvement_percent}%")

When status is succeeded, quality includes fields such as score_before / score_after (0–10 scale), letter grades, improvement_percent, and percentile_after where applicable. If evaluation is still pending or running, expect quality to be absent until the pipeline finishes.

Quick view on the dataset record

datasets.get returns a Dataset whose evaluation_summary is populated when a compact summary is available—useful for dashboards or listing datasets without a second request:

ds = client.datasets.get(dataset_id)
if ds.evaluation_summary:
    print(ds.evaluation_summary.score_after, ds.evaluation_summary.improvement_percent)

get_status focuses on ingestion/run progress and does not include evaluation; use get or get_evaluation when you care about quality metrics.

Poll until evaluation finishes

Adaptation may show succeeded before evaluation is done. Poll get_evaluation (or get if you only need evaluation_summary) until status is no longer pending or running:

import time

while True:
    ev = client.datasets.get_evaluation(dataset_id)
    if ev.status in ("succeeded", "failed", "skipped"):
        break
    time.sleep(5)

if ev.status == "succeeded" and ev.quality:
    print(ev.quality.model_dump(exclude_none=True))

Adjust interval and timeout to match your environment (notebooks vs CI).

Async clients use the same shape: await client.datasets.get_evaluation(dataset_id).

Full example

After a completed run (see Getting started), pull evaluation:

import os

from adaption import Adaption

client = Adaption(api_key=os.environ["ADAPTION_API_KEY"])
dataset_id = os.environ["ADAPTION_DATASET_ID"]

ev = client.datasets.get_evaluation(dataset_id)
print(f"Evaluation status: {ev.status}")

ds = client.datasets.get(dataset_id)
print(f"Dataset status: {ds.status}")
if ds.evaluation_summary:
    print(f"Summary: {ds.evaluation_summary.model_dump(exclude_none=True)}")

When to use it

Use get_evaluation when you need explicit evaluation status and full quality details. Use get when you already fetch the dataset and want a single summary on the same object. Pair either approach with estimate=True on future runs (see Processing large datasets and the FAQ) when you are iterating on quality before scaling row counts.