Getting started
Install the Adaption Python SDK, ingest data, run Adaptive Data adaptation jobs, and export results—same lifecycle in code as in the product.
Adaptive Data is designed so the full path from raw file to augmented output is repeatable in code: a short sequence of API calls, whether you run once in a notebook or at scale in a pipeline. This page uses the SDK’s convenience methods (upload, import, run, wait, download). For granular control—raw create, presigned upload steps, and every field—see the API Reference.
Install
Section titled “Install”pip install adaptionAuthentication
Section titled “Authentication”from adaption import Adaption
client = Adaption(api_key="pt_live_...")# Or set the ADAPTION_API_KEY environment variable and omit the argumentIngest data (three ways)
Section titled “Ingest data (three ways)”Ingest data from where it already lives—local files, Hugging Face, or Kaggle—without hand-built conversion scripts for supported formats. Each path returns a dataset_id you pass into datasets.run for the adapt step.
1. Upload a local file
upload_file creates the dataset, uploads via a presigned URL, and confirms the upload.
Supported extensions: .csv, .json, .jsonl, .parquet.
result = client.datasets.upload_file("training_data.csv")print(result.dataset_id)Optional custom name (defaults to the filename without extension):
result = client.datasets.upload_file("data.csv", name="my-dataset")2. Import from Hugging Face
Point at a Hugging Face dataset URL and the file(s) to import.
resp = client.datasets.create_from_huggingface( url="https://huggingface.co/datasets/org/repo", files=["train.csv"],)print(resp.dataset_id)Import runs asynchronously on the server. Poll status or call wait_for_completion() before you start an adaptation run so ingestion has finished and the dataset is ready to adapt.
3. Import from Kaggle
Use the Kaggle dataset page URL and the files to pull.
resp = client.datasets.create_from_kaggle( url="https://www.kaggle.com/datasets/org/dataset-name", files=["data.csv"],)print(resp.dataset_id)Kaggle credentials must be registered in Adaption: open API keys settings (sign in if prompted) and add your Kaggle API credentials there before importing.
Adapt: start a run
Section titled “Adapt: start a run”Adapt applies the platform’s recipes and optimization to your dataset. Map columns to the roles the API expects (prompt is required; others are optional). Call with estimate=True first to see estimated cost and duration before you commit—no surprises on large jobs.
run = client.datasets.run( dataset_id, column_mapping={ "prompt": "instruction", # required — your prompt column "completion": "response", # optional — completion column # "chat": "conversation", # optional — chat column # "context": ["source", "ref"], # optional — context columns },)print(run.run_id)print(run.estimated_credits_consumed)Estimate cost without starting a run:
estimate = client.datasets.run( dataset_id, column_mapping={"prompt": "instruction", "completion": "response"}, estimate=True,)print(f"Would cost {estimate.estimated_credits_consumed} credits")Wait for completion
Section titled “Wait for completion”from adaption import DatasetTimeout
try: status = client.datasets.wait_for_completion(dataset_id, timeout=600) print(f"Done: {status.status}") # "succeeded" or "failed" if status.error: print(f"Error: {status.error.message}")except DatasetTimeout as e: print(f"Still running after {e.timeout}s (last status: {e.last_status})")Polling uses exponential backoff (2s → 4s → 8s → … up to 30s). Default timeout is one hour.
List datasets
Section titled “List datasets”Auto-paginated iterator:
for dataset in client.datasets.list(): print(f"{dataset.dataset_id} — {dataset.status}")With filters:
for dataset in client.datasets.list(status="succeeded", limit=10): print(dataset.name)Export: download results
Section titled “Export: download results”Export augmented data for training pipelines, evaluation harnesses, or downstream storage. download returns a presigned URL you can fetch with your HTTP client.
url = client.datasets.download(dataset_id)# presigned S3 download URLEnd-to-end example (file upload)
Section titled “End-to-end example (file upload)”import time
from adaption import Adaption, DatasetTimeout
client = Adaption(api_key="pt_live_...")
# 1. Uploadresult = client.datasets.upload_file("training_data.csv")
# 2. Wait for file processingwhile True: status = client.datasets.get_status(result.dataset_id) if status.row_count is not None: break time.sleep(2)
# 3. Adapt (start run)run = client.datasets.run( result.dataset_id, column_mapping={"prompt": "instruction", "completion": "response"},)print( f"Run started: {run.run_id}, ~{run.estimated_minutes} min, " f"{run.estimated_credits_consumed} credits")
# 4. Wait for completiontry: final = client.datasets.wait_for_completion(result.dataset_id, timeout=1800) print(f"Finished: {final.status}")except DatasetTimeout: print("Timed out — check status manually")
# 5. Downloadurl = client.datasets.download(result.dataset_id)print(f"Download: {url}")Low-level and async
Section titled “Low-level and async”The convenience methods delegate to lower-level APIs (for example datasets.create, upload.initiate, upload.complete_by_id, get, get_status). Browse the API Reference for full signatures.
Async variants exist for all methods:
from adaption import AsyncAdaption
client = AsyncAdaption(api_key="pt_live_...")
result = await client.datasets.upload_file("data.csv")status = await client.datasets.wait_for_completion(result.dataset_id)async for dataset in client.datasets.list(): print(dataset.dataset_id)Going further
Section titled “Going further”Encode how data should behave—not only what it contains—via runs:
- Processing large datasets — subsample with
job_specification.max_rowsbefore a full run. - Evaluating dataset quality — read evaluation status and metrics after a run.
- Mitigating hallucinations — grounding and fact-aligned completions.
- Reasoning traces — recipe-driven reasoning for auditable, trainable outputs.
- Safety and length constraints — length and safety as specification, aligned with Blueprint.