Getting started

Introduction

Install the Adaption Python SDK, ingest data, run Adaptive Data adaptation jobs, and export results—same lifecycle in code as in the product.

Adaptive Data is designed so the full path from raw file to augmented output is repeatable in code: a short sequence of API calls, whether you run once in a notebook or at scale in a pipeline. This page uses the SDK’s convenience methods (upload, import, run, wait, download). For granular control—raw create, presigned upload steps, and every field—see the API Reference.

Install

pip install adaption

Authentication

from adaption import Adaption

client = Adaption(api_key="pt_live_...")
# Or set the ADAPTION_API_KEY environment variable and omit the argument

Ingest data (three ways)

Ingest data from where it already lives—local files, Hugging Face, or Kaggle—without hand-built conversion scripts for supported formats. Each path returns a dataset_id you pass into datasets.run for the adapt step.

1. Upload a local file

upload_file creates the dataset, uploads via a presigned URL, and confirms the upload.

Supported extensions: .csv, .json, .jsonl, .parquet.

result = client.datasets.upload_file("training_data.csv")
print(result.dataset_id)

Optional custom name (defaults to the filename without extension):

result = client.datasets.upload_file("data.csv", name="my-dataset")

2. Import from Hugging Face

Point at a Hugging Face dataset URL and the file(s) to import.

resp = client.datasets.create_from_huggingface(
    url="https://huggingface.co/datasets/org/repo",
    files=["train.csv"],
)
print(resp.dataset_id)

Import runs asynchronously on the server. Poll status or call wait_for_completion() before you start an adaptation run so ingestion has finished and the dataset is ready to adapt.

3. Import from Kaggle

Use the Kaggle dataset page URL and the files to pull.

resp = client.datasets.create_from_kaggle(
    url="https://www.kaggle.com/datasets/org/dataset-name",
    files=["data.csv"],
)
print(resp.dataset_id)

Kaggle credentials must be registered in Adaption: open API keys settings (sign in if prompted) and add your Kaggle API credentials there before importing.

Adapt: start a run

Adapt applies the platform’s recipes and optimization to your dataset. Map columns to the roles the API expects (prompt is required; others are optional). Call with estimate=True first to see estimated cost and duration before you commit—no surprises on large jobs.

run = client.datasets.run(
    dataset_id,
    column_mapping={
        "prompt": "instruction",  # required — your prompt column
        "completion": "response",  # optional — completion column
        # "chat": "conversation",  # optional — chat column
        # "context": ["source", "ref"],  # optional — context columns
    },
)
print(run.run_id)
print(run.estimated_credits_consumed)

Estimate cost without starting a run:

estimate = client.datasets.run(
    dataset_id,
    column_mapping={"prompt": "instruction", "completion": "response"},
    estimate=True,
)
print(f"Would cost {estimate.estimated_credits_consumed} credits")

Wait for completion

from adaption import DatasetTimeout

try:
    status = client.datasets.wait_for_completion(dataset_id, timeout=600)
    print(f"Done: {status.status}")  # "succeeded" or "failed"
    if status.error:
        print(f"Error: {status.error.message}")
except DatasetTimeout as e:
    print(f"Still running after {e.timeout}s (last status: {e.last_status})")

Polling uses exponential backoff (2s → 4s → 8s → … up to 30s). Default timeout is one hour.

List datasets

Auto-paginated iterator:

for dataset in client.datasets.list():
    print(f"{dataset.dataset_id} — {dataset.status}")

With filters:

for dataset in client.datasets.list(status="succeeded", limit=10):
    print(dataset.name)

Export: download results

Export augmented data for training pipelines, evaluation harnesses, or downstream storage. download returns a presigned URL you can fetch with your HTTP client.

url = client.datasets.download(dataset_id)
# presigned S3 download URL

End-to-end example (file upload)

import time

from adaption import Adaption, DatasetTimeout

client = Adaption(api_key="pt_live_...")

# 1. Upload
result = client.datasets.upload_file("training_data.csv")

# 2. Wait for file processing
while True:
    status = client.datasets.get_status(result.dataset_id)
    if status.row_count is not None:
        break
    time.sleep(2)

# 3. Adapt (start run)
run = client.datasets.run(
    result.dataset_id,
    column_mapping={"prompt": "instruction", "completion": "response"},
)
print(
    f"Run started: {run.run_id}, ~{run.estimated_minutes} min, "
    f"{run.estimated_credits_consumed} credits"
)

# 4. Wait for completion
try:
    final = client.datasets.wait_for_completion(result.dataset_id, timeout=1800)
    print(f"Finished: {final.status}")
except DatasetTimeout:
    print("Timed out — check status manually")

# 5. Download
url = client.datasets.download(result.dataset_id)
print(f"Download: {url}")

Low-level and async

The convenience methods delegate to lower-level APIs (for example datasets.create, upload.initiate, upload.complete_by_id, get, get_status). Browse the API Reference for full signatures.

Async variants exist for all methods:

from adaption import AsyncAdaption

client = AsyncAdaption(api_key="pt_live_...")

result = await client.datasets.upload_file("data.csv")
status = await client.datasets.wait_for_completion(result.dataset_id)
async for dataset in client.datasets.list():
    print(dataset.dataset_id)

Going further

Encode how data should behave—not only what it contains—via runs:

Processing large datasets — subsample with job_specification.max_rows before a full run.
Evaluating dataset quality — read evaluation status and metrics after a run.
Mitigating hallucinations — grounding and fact-aligned completions.
Reasoning traces — recipe-driven reasoning for auditable, trainable outputs.
Safety and length constraints — length and safety as specification, aligned with Blueprint.