Skip to content
SupportGo to app
Introduction

Getting started

Install the Adaption Python SDK, ingest data, run Adaptive Data adaptation jobs, and export results—same lifecycle in code as in the product.

Adaptive Data is designed so the full path from raw file to augmented output is repeatable in code: a short sequence of API calls, whether you run once in a notebook or at scale in a pipeline. This page uses the SDK’s convenience methods (upload, import, run, wait, download). For granular control—raw create, presigned upload steps, and every field—see the API Reference.

Terminal window
pip install adaption
from adaption import Adaption
client = Adaption(api_key="pt_live_...")
# Or set the ADAPTION_API_KEY environment variable and omit the argument

Ingest data from where it already lives—local files, Hugging Face, or Kaggle—without hand-built conversion scripts for supported formats. Each path returns a dataset_id you pass into datasets.run for the adapt step.

1. Upload a local file

upload_file creates the dataset, uploads via a presigned URL, and confirms the upload.

Supported extensions: .csv, .json, .jsonl, .parquet.

result = client.datasets.upload_file("training_data.csv")
print(result.dataset_id)

Optional custom name (defaults to the filename without extension):

result = client.datasets.upload_file("data.csv", name="my-dataset")
2. Import from Hugging Face

Point at a Hugging Face dataset URL and the file(s) to import.

resp = client.datasets.create_from_huggingface(
url="https://huggingface.co/datasets/org/repo",
files=["train.csv"],
)
print(resp.dataset_id)

Import runs asynchronously on the server. Poll status or call wait_for_completion() before you start an adaptation run so ingestion has finished and the dataset is ready to adapt.

3. Import from Kaggle

Use the Kaggle dataset page URL and the files to pull.

resp = client.datasets.create_from_kaggle(
url="https://www.kaggle.com/datasets/org/dataset-name",
files=["data.csv"],
)
print(resp.dataset_id)

Kaggle credentials must be registered in Adaption: open API keys settings (sign in if prompted) and add your Kaggle API credentials there before importing.


Adapt applies the platform’s recipes and optimization to your dataset. Map columns to the roles the API expects (prompt is required; others are optional). Call with estimate=True first to see estimated cost and duration before you commit—no surprises on large jobs.

run = client.datasets.run(
dataset_id,
column_mapping={
"prompt": "instruction", # required — your prompt column
"completion": "response", # optional — completion column
# "chat": "conversation", # optional — chat column
# "context": ["source", "ref"], # optional — context columns
},
)
print(run.run_id)
print(run.estimated_credits_consumed)

Estimate cost without starting a run:

estimate = client.datasets.run(
dataset_id,
column_mapping={"prompt": "instruction", "completion": "response"},
estimate=True,
)
print(f"Would cost {estimate.estimated_credits_consumed} credits")

from adaption import DatasetTimeout
try:
status = client.datasets.wait_for_completion(dataset_id, timeout=600)
print(f"Done: {status.status}") # "succeeded" or "failed"
if status.error:
print(f"Error: {status.error.message}")
except DatasetTimeout as e:
print(f"Still running after {e.timeout}s (last status: {e.last_status})")

Polling uses exponential backoff (2s → 4s → 8s → … up to 30s). Default timeout is one hour.


Auto-paginated iterator:

for dataset in client.datasets.list():
print(f"{dataset.dataset_id}{dataset.status}")

With filters:

for dataset in client.datasets.list(status="succeeded", limit=10):
print(dataset.name)

Export augmented data for training pipelines, evaluation harnesses, or downstream storage. download returns a presigned URL you can fetch with your HTTP client.

url = client.datasets.download(dataset_id)
# presigned S3 download URL

import time
from adaption import Adaption, DatasetTimeout
client = Adaption(api_key="pt_live_...")
# 1. Upload
result = client.datasets.upload_file("training_data.csv")
# 2. Wait for file processing
while True:
status = client.datasets.get_status(result.dataset_id)
if status.row_count is not None:
break
time.sleep(2)
# 3. Adapt (start run)
run = client.datasets.run(
result.dataset_id,
column_mapping={"prompt": "instruction", "completion": "response"},
)
print(
f"Run started: {run.run_id}, ~{run.estimated_minutes} min, "
f"{run.estimated_credits_consumed} credits"
)
# 4. Wait for completion
try:
final = client.datasets.wait_for_completion(result.dataset_id, timeout=1800)
print(f"Finished: {final.status}")
except DatasetTimeout:
print("Timed out — check status manually")
# 5. Download
url = client.datasets.download(result.dataset_id)
print(f"Download: {url}")

The convenience methods delegate to lower-level APIs (for example datasets.create, upload.initiate, upload.complete_by_id, get, get_status). Browse the API Reference for full signatures.

Async variants exist for all methods:

from adaption import AsyncAdaption
client = AsyncAdaption(api_key="pt_live_...")
result = await client.datasets.upload_file("data.csv")
status = await client.datasets.wait_for_completion(result.dataset_id)
async for dataset in client.datasets.list():
print(dataset.dataset_id)

Encode how data should behave—not only what it contains—via runs: