---
title: Getting started | Adaption
description: Install the Adaption Python SDK, ingest data, run Adaptive Data adaptation jobs, and export results—same lifecycle in code as in the product.
---

**Adaptive Data** is designed so the full path from raw file to augmented output is repeatable in code: a short sequence of API calls, whether you run once in a notebook or at scale in a pipeline. This page uses the SDK’s **convenience methods** (upload, import, run, wait, download). For granular control—raw `create`, presigned upload steps, and every field—see the [API Reference](/api/index.md).

## Install

Terminal window

```
pip install adaption
```

## Authentication

```
from adaption import Adaption


client = Adaption(api_key="pt_live_...")
# Or set the ADAPTION_API_KEY environment variable and omit the argument
```

---

## Ingest data (three ways)

**Ingest** data from where it already lives—local files, Hugging Face, or Kaggle—without hand-built conversion scripts for supported formats. Each path returns a `dataset_id` you pass into `datasets.run` for the **adapt** step.

1\. Upload a local file

`upload_file` creates the dataset, uploads via a presigned URL, and confirms the upload.

Supported extensions: **`.csv`**, **`.json`**, **`.jsonl`**, **`.parquet`**.

```
result = client.datasets.upload_file("training_data.csv")
print(result.dataset_id)
```

Optional custom name (defaults to the filename without extension):

```
result = client.datasets.upload_file("data.csv", name="my-dataset")
```

2\. Import from Hugging Face

Point at a Hugging Face dataset URL and the file(s) to import.

```
resp = client.datasets.create_from_huggingface(
    url="https://huggingface.co/datasets/org/repo",
    files=["train.csv"],
)
print(resp.dataset_id)
```

Import runs **asynchronously on the server**. Poll status or call `wait_for_completion()` before you start an adaptation run so ingestion has finished and the dataset is ready to adapt.

3\. Import from Kaggle

Use the Kaggle dataset page URL and the files to pull.

```
resp = client.datasets.create_from_kaggle(
    url="https://www.kaggle.com/datasets/org/dataset-name",
    files=["data.csv"],
)
print(resp.dataset_id)
```

**Kaggle credentials** must be registered in Adaption: open **[API keys settings](https://frontend-pharos-app-dev.vercel.app/app/settings?tab=api_keys)** (sign in if prompted) and add your Kaggle API credentials there before importing.

---

## Adapt: start a run

**Adapt** applies the platform’s recipes and optimization to your dataset. Map columns to the roles the API expects (`prompt` is required; others are optional). Call with **`estimate=True`** first to see estimated cost and duration **before** you commit—no surprises on large jobs.

```
run = client.datasets.run(
    dataset_id,
    column_mapping={
        "prompt": "instruction",  # required — your prompt column
        "completion": "response",  # optional — completion column
        # "chat": "conversation",  # optional — chat column
        # "context": ["source", "ref"],  # optional — context columns
    },
)
print(run.run_id)
print(run.estimated_credits_consumed)
```

Estimate cost **without** starting a run:

```
estimate = client.datasets.run(
    dataset_id,
    column_mapping={"prompt": "instruction", "completion": "response"},
    estimate=True,
)
print(f"Would cost {estimate.estimated_credits_consumed} credits")
```

---

## Wait for completion

```
from adaption import DatasetTimeout


try:
    status = client.datasets.wait_for_completion(dataset_id, timeout=600)
    print(f"Done: {status.status}")  # "succeeded" or "failed"
    if status.error:
        print(f"Error: {status.error.message}")
except DatasetTimeout as e:
    print(f"Still running after {e.timeout}s (last status: {e.last_status})")
```

Polling uses exponential backoff (2s → 4s → 8s → … up to 30s). Default timeout is one hour.

---

## List datasets

Auto-paginated iterator:

```
for dataset in client.datasets.list():
    print(f"{dataset.dataset_id} — {dataset.status}")
```

With filters:

```
for dataset in client.datasets.list(status="succeeded", limit=10):
    print(dataset.name)
```

---

## Export: download results

**Export** augmented data for training pipelines, evaluation harnesses, or downstream storage. `download` returns a presigned URL you can fetch with your HTTP client.

```
url = client.datasets.download(dataset_id)
# presigned S3 download URL
```

---

## End-to-end example (file upload)

```
import time


from adaption import Adaption, DatasetTimeout


client = Adaption(api_key="pt_live_...")


# 1. Upload
result = client.datasets.upload_file("training_data.csv")


# 2. Wait for file processing
while True:
    status = client.datasets.get_status(result.dataset_id)
    if status.row_count is not None:
        break
    time.sleep(2)


# 3. Adapt (start run)
run = client.datasets.run(
    result.dataset_id,
    column_mapping={"prompt": "instruction", "completion": "response"},
)
print(
    f"Run started: {run.run_id}, ~{run.estimated_minutes} min, "
    f"{run.estimated_credits_consumed} credits"
)


# 4. Wait for completion
try:
    final = client.datasets.wait_for_completion(result.dataset_id, timeout=1800)
    print(f"Finished: {final.status}")
except DatasetTimeout:
    print("Timed out — check status manually")


# 5. Download
url = client.datasets.download(result.dataset_id)
print(f"Download: {url}")
```

---

## Low-level and async

The convenience methods delegate to lower-level APIs (for example `datasets.create`, `upload.initiate`, `upload.complete_by_id`, `get`, `get_status`). Browse the [API Reference](/api/index.md) for full signatures.

**Async** variants exist for all methods:

```
from adaption import AsyncAdaption


client = AsyncAdaption(api_key="pt_live_...")


result = await client.datasets.upload_file("data.csv")
status = await client.datasets.wait_for_completion(result.dataset_id)
async for dataset in client.datasets.list():
    print(dataset.dataset_id)
```

---

## Going further

Encode **how data should behave**—not only what it contains—via runs:

- [Processing large datasets](/guides/processing-large-datasets/index.md) — subsample with `job_specification.max_rows` before a full run.
- [Evaluating dataset quality](/guides/evaluating-dataset-quality/index.md) — read evaluation status and metrics after a run.
- [Mitigating hallucinations](/guides/mitigating-hallucinations/index.md) — grounding and fact-aligned completions.
- [Reasoning traces](/guides/reasoning-traces/index.md) — recipe-driven reasoning for auditable, trainable outputs.
- [Safety and length constraints](/guides/safety-and-length-constraints/index.md) — length and safety as **specification**, aligned with [Blueprint](https://www.adaptionlabs.ai/blog/blueprint).